SlideShare une entreprise Scribd logo
1  sur  18
LinkedIn Skills: Large-Scale Topic Extraction
and Inference
Mathieu Bastian
LinkedIn Corporation ©2014 All Rights Reserved
The World’s Largest Professional Network
Members Worldwide
2 new
Members Per Second
100M+
Monthly Unique Visitors
313M+ 3M+
Company Pages
Connecting Talent  Opportunity. At scale…
LinkedIn Profile
 313M+ profiles in 200+ countries
 Organized into sections
– Standardized: Companies, Titles, Industry,
Location etc.
– Unstandardized: Text (Summary, Position
description, specialties)
 Skills & Endorsements section
– Introduced in 2011
– Limited to 50 skills per profile
Skills at LinkedIn
 Key component of the
professional identity
 Dictionary of 45k+ skills in
English
 Members have diverse skills
– Java Programming
– Ballet
– Politics
– Bow Hunting
 Many of these are long-tailExample of a Skills section on a LinkedIn profile
Folksonomy creation
LinkedIn Corporation ©2014 All Rights Reserved
Folksonomy creation
 Create a folksonomy of skills based on LinkedIn profiles
 Leverage the “specialties” section
 Detect comma-separated lists and extract skill phrases
 Use stop-list and exclude other entities (e.g. companies, titles,
degrees)
 150k skill phrases extracted after removing long-tail noise
skill
phrases
Disambiguation
 Need to add context to differentiate skill phrases with multiple
meanings (e.g. NLP = Natural Language Processing,
NLP = Neuro-linguistic programming)
 Different meanings have different sets of related phrases
 Use Jaccard Similarity on LinkedIn profiles for related phrases and
then SVD + KMeans to identify clusers of phrases
References: R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern information retrieval, volume 463
De-duplication
 Need to group phrases with similar meaning together. Examples:
– Acronyms: B2B, Business to Business
– Synonyms: Java Programming, Java Development
– Typos: Government Liason
 Many of the skill phrases could be tied to a Wikipedia page
 Built Mechanical Turk (www.mturk.com) task to find the Wikipedia
page associated with a skill phrase
Java programming
Java development
Java
http://en.wikipedia.org/wiki/Java
_(programming_language)
Cluster
 Extraction based on 12M of LinkedIn profiles with “specialties”
 Extracted 150k skill phrases
 Clustered related phrases adding the industry context to ambiguous
phrases
 De-duplication using MTurk
 Final master list contains 50k skills
Folksonomy creation summary
Examples of synonyms of
“Microsoft Office”
Inference and Recommendation
LinkedIn Corporation ©2014 All Rights Reserved
 Goal was boosting skills adoption with a recommender system:
“suggested skills”
 Inferring the skills members have, similar to discovering latent
attributes in profiles
 Develop a collaborative filtering solution using profile attributes
Skills Inference and Recommendation
References: A. Mislove and al. You are who you know: Inferring user profiles in online social networks.
R. Jäschke and al. Tag recommendations in folksonomies.
Skills Typeahead on LinkedIn
Suggested Skills
 Large number of standardized profile attributes (i.e. can be
represented by a unique identifier)
 Members with similar profiles attributes are likely to have similar
skills (e.g. If you work at Apple, you probably know “Mac OS”)
Features
Type Example Cardinality
Title (Headline) Product Manager Thousands
Function Engineering Dozens
Industry Healthcare Dozens
Title (Employment Position) Product Manager Thousands
Company LinkedIn Millions
Group membership Healthcare Professionals Millions
Skills Matlab Thousands
 Calculate the likelihood that a member has a given
skill, given his profile attributes
 No direct user similarity metric
 Large number of features (e.g. 3M companies) and 50k classes
Problem
the set of profile attributes
the folksonomy of skills
 Used a Naïve Bayes Classifier to produce inferred skills
 Training data based on members already with skills
 Result is a ranking of inferred skills, which can directly be used in
“suggested skills”
 Evaluation methodology
– AUC for each skill
– P@k and Recall for evaluating the recommendations
Naïve Bayes Classifier
with
 Evaluate how well we can predict skills members’ have
Evaluation
ROC of skill “Hadoop” Distribution of ROC across
all skills
 12X improvement in conversion using “suggested skills”
Results
Without
“suggested skills”
With
“suggested skills”
Our Contributions
 End-to-end creation of a skills folksonomy based on free-text
specialties section
 Efficient inferred skills model with good offline performance
 Skills recommender system based on profile attributes
Thank You

Contenu connexe

En vedette

Visualize Big Graph Data
Visualize Big Graph DataVisualize Big Graph Data
Visualize Big Graph Data
Mathieu Bastian
 
Mining Methods
Mining MethodsMining Methods
Mining Methods
VR M
 

En vedette (7)

Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...
 
How to get started in Kaggle competition
How to get started in Kaggle competitionHow to get started in Kaggle competition
How to get started in Kaggle competition
 
Visualize Big Graph Data
Visualize Big Graph DataVisualize Big Graph Data
Visualize Big Graph Data
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 
Mining ppt 2014
Mining ppt 2014Mining ppt 2014
Mining ppt 2014
 
Mining Methods
Mining MethodsMining Methods
Mining Methods
 
Solr, c'est simple et Big Data ready - prez au Lyon jug Fév 2014
Solr, c'est simple et Big Data ready - prez au Lyon jug Fév 2014Solr, c'est simple et Big Data ready - prez au Lyon jug Fév 2014
Solr, c'est simple et Big Data ready - prez au Lyon jug Fév 2014
 

Similaire à LinkedIn Skills: RecSys Conference 2014

Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)
Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)
Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)
Howin Chan, PHR
 
Information Architecture
Information ArchitectureInformation Architecture
Information Architecture
Olivier Tripet
 
Microsoft The Platform For Knowledge Management 26 10 2006 V1.0
Microsoft   The Platform For Knowledge Management   26 10 2006   V1.0Microsoft   The Platform For Knowledge Management   26 10 2006   V1.0
Microsoft The Platform For Knowledge Management 26 10 2006 V1.0
Peter de Haas
 
DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19
Yong Siang (Ivan) Tan
 
Jive Software - Clearspace Overview
Jive Software - Clearspace OverviewJive Software - Clearspace Overview
Jive Software - Clearspace Overview
MeganRossFarrell
 

Similaire à LinkedIn Skills: RecSys Conference 2014 (20)

Introduction to enterprise search
Introduction to enterprise searchIntroduction to enterprise search
Introduction to enterprise search
 
EmployeePages The next generation staff directory
EmployeePages The next generation staff directoryEmployeePages The next generation staff directory
EmployeePages The next generation staff directory
 
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn SearchStructure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
 
LLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
 
Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)
Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)
Michaels-Where Creativity Happens (Dallas Ohug Nov 4, 2010)
 
Information Architecture
Information ArchitectureInformation Architecture
Information Architecture
 
Document repositories-and-metadata
Document repositories-and-metadataDocument repositories-and-metadata
Document repositories-and-metadata
 
From keyword-based search to language-agnostic semantic search
From keyword-based search to language-agnostic semantic searchFrom keyword-based search to language-agnostic semantic search
From keyword-based search to language-agnostic semantic search
 
Microsoft The Platform For Knowledge Management 26 10 2006 V1.0
Microsoft   The Platform For Knowledge Management   26 10 2006   V1.0Microsoft   The Platform For Knowledge Management   26 10 2006   V1.0
Microsoft The Platform For Knowledge Management 26 10 2006 V1.0
 
Sla canada student nov 25 2021
Sla canada student nov 25 2021Sla canada student nov 25 2021
Sla canada student nov 25 2021
 
DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19
 
Navigating the Talent Crunch - Effective Reskilling Strategies for Software E...
Navigating the Talent Crunch - Effective Reskilling Strategies for Software E...Navigating the Talent Crunch - Effective Reskilling Strategies for Software E...
Navigating the Talent Crunch - Effective Reskilling Strategies for Software E...
 
Overview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial IntelligenceOverview of Taxonomies and Artificial Intelligence
Overview of Taxonomies and Artificial Intelligence
 
Software development learning path - board infinity
Software development learning path - board infinitySoftware development learning path - board infinity
Software development learning path - board infinity
 
How Azure helps to build better business processes and customer experiences w...
How Azure helps to build better business processes and customer experiences w...How Azure helps to build better business processes and customer experiences w...
How Azure helps to build better business processes and customer experiences w...
 
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
 
Making IA Real: Planning an Information Architecture Strategy
Making IA Real: Planning an Information Architecture StrategyMaking IA Real: Planning an Information Architecture Strategy
Making IA Real: Planning an Information Architecture Strategy
 
MMS2010
MMS2010MMS2010
MMS2010
 
KMA Webinar: Managed Metadata Services in SharePoint 2010
KMA Webinar: Managed Metadata Services in SharePoint 2010KMA Webinar: Managed Metadata Services in SharePoint 2010
KMA Webinar: Managed Metadata Services in SharePoint 2010
 
Jive Software - Clearspace Overview
Jive Software - Clearspace OverviewJive Software - Clearspace Overview
Jive Software - Clearspace Overview
 

Dernier

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 

Dernier (20)

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 

LinkedIn Skills: RecSys Conference 2014

  • 1. LinkedIn Skills: Large-Scale Topic Extraction and Inference Mathieu Bastian LinkedIn Corporation ©2014 All Rights Reserved
  • 2. The World’s Largest Professional Network Members Worldwide 2 new Members Per Second 100M+ Monthly Unique Visitors 313M+ 3M+ Company Pages Connecting Talent  Opportunity. At scale…
  • 3. LinkedIn Profile  313M+ profiles in 200+ countries  Organized into sections – Standardized: Companies, Titles, Industry, Location etc. – Unstandardized: Text (Summary, Position description, specialties)  Skills & Endorsements section – Introduced in 2011 – Limited to 50 skills per profile
  • 4. Skills at LinkedIn  Key component of the professional identity  Dictionary of 45k+ skills in English  Members have diverse skills – Java Programming – Ballet – Politics – Bow Hunting  Many of these are long-tailExample of a Skills section on a LinkedIn profile
  • 5. Folksonomy creation LinkedIn Corporation ©2014 All Rights Reserved
  • 6. Folksonomy creation  Create a folksonomy of skills based on LinkedIn profiles  Leverage the “specialties” section  Detect comma-separated lists and extract skill phrases  Use stop-list and exclude other entities (e.g. companies, titles, degrees)  150k skill phrases extracted after removing long-tail noise skill phrases
  • 7. Disambiguation  Need to add context to differentiate skill phrases with multiple meanings (e.g. NLP = Natural Language Processing, NLP = Neuro-linguistic programming)  Different meanings have different sets of related phrases  Use Jaccard Similarity on LinkedIn profiles for related phrases and then SVD + KMeans to identify clusers of phrases References: R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern information retrieval, volume 463
  • 8. De-duplication  Need to group phrases with similar meaning together. Examples: – Acronyms: B2B, Business to Business – Synonyms: Java Programming, Java Development – Typos: Government Liason  Many of the skill phrases could be tied to a Wikipedia page  Built Mechanical Turk (www.mturk.com) task to find the Wikipedia page associated with a skill phrase Java programming Java development Java http://en.wikipedia.org/wiki/Java _(programming_language) Cluster
  • 9.  Extraction based on 12M of LinkedIn profiles with “specialties”  Extracted 150k skill phrases  Clustered related phrases adding the industry context to ambiguous phrases  De-duplication using MTurk  Final master list contains 50k skills Folksonomy creation summary Examples of synonyms of “Microsoft Office”
  • 10. Inference and Recommendation LinkedIn Corporation ©2014 All Rights Reserved
  • 11.  Goal was boosting skills adoption with a recommender system: “suggested skills”  Inferring the skills members have, similar to discovering latent attributes in profiles  Develop a collaborative filtering solution using profile attributes Skills Inference and Recommendation References: A. Mislove and al. You are who you know: Inferring user profiles in online social networks. R. Jäschke and al. Tag recommendations in folksonomies. Skills Typeahead on LinkedIn Suggested Skills
  • 12.  Large number of standardized profile attributes (i.e. can be represented by a unique identifier)  Members with similar profiles attributes are likely to have similar skills (e.g. If you work at Apple, you probably know “Mac OS”) Features Type Example Cardinality Title (Headline) Product Manager Thousands Function Engineering Dozens Industry Healthcare Dozens Title (Employment Position) Product Manager Thousands Company LinkedIn Millions Group membership Healthcare Professionals Millions Skills Matlab Thousands
  • 13.  Calculate the likelihood that a member has a given skill, given his profile attributes  No direct user similarity metric  Large number of features (e.g. 3M companies) and 50k classes Problem the set of profile attributes the folksonomy of skills
  • 14.  Used a Naïve Bayes Classifier to produce inferred skills  Training data based on members already with skills  Result is a ranking of inferred skills, which can directly be used in “suggested skills”  Evaluation methodology – AUC for each skill – P@k and Recall for evaluating the recommendations Naïve Bayes Classifier with
  • 15.  Evaluate how well we can predict skills members’ have Evaluation ROC of skill “Hadoop” Distribution of ROC across all skills
  • 16.  12X improvement in conversion using “suggested skills” Results Without “suggested skills” With “suggested skills”
  • 17. Our Contributions  End-to-end creation of a skills folksonomy based on free-text specialties section  Efficient inferred skills model with good offline performance  Skills recommender system based on profile attributes

Notes de l'éditeur

  1. Skills are a key component of the member’s professional identity. It’s very important to have a broad and compelling dictionary of skills so members can express their competencies and recruiters can find members for those skills. Today, the dictionary is rich of more than 45k thousands skills. These include the things most people expect such as PowerPoint, Matlab or Public Speaking but also soft skills and rare skills. In fact, the distribution of occurrences of skills is long-tail distributed. The top 5000 skills is enough to cover 95% of occurrences. In other words, most of our skills are rare. Yet, they are important as members expect all industries to be represented in detail. It’s important to note that our definition of skills go beyond just skills but also include areas of expertise. For instance, Natural Gas is not a skill but is a valid area of expertise one might want to add to his profile.
  2. When we started looking at this problem, it didn’t take us much time to realize that we couldn’t leverage any existing list of skills out there, mostly because they weren’t broad enough. Instead, we decided to extract these skills directly from profiles and create a master list. We knew we would face challenges such as duplicates and disambiguation but at least we knew it was done before (free text extraction) would be based on member’s data. At the time, LinkedIn had a “specialties” section on profile. It was free-text but we noticed that members would often enumerate keywords, which often were skills. We built a simple algorithm that would count the number of commas in a paragraph to decide whether it was a comma-separated list. After extracting phrases, we removed other known entities such as titles or companies. Fortunately, LinkedIn posses this data as well and it wasn’t too difficult to filter them out. Some cases were in the grey zone though. For instance: Computer Science is both a skill and a field of study. Eventually, this process created about 150k skill phrases. We used a minimum threshold of 20 occurences.
  3. Then, we tackled the problem of disambiguating these skill phrases. Many of them can have multiple meanings, especially abbreviations and acronyms. For instance, NLP can either mean Natural Language Processing but also Neuro-Linguistic Programming. There is no right or wrong answer and we should be equipped with the tools to be able to recognize one or the other based on the context. A common solution to this problem is to use the set of related phrases. The intuition is that two different meanings would have different sets of related phrases. For instance, here you can see the related phrases of two meanings of “Angels”. We define how skill phrases are related using a Jaccard Similarity on LinkedIn profile.
  4. The other important issue with folksonomies is duplicates. I’ve listed here a few of the common patters: acronyms, abbreviations, synonyms and typos. There are some data mining techniques to help cluster those phrases together but we started with something even simpler than that. During a small scale experiment, we observed that a majority of skill phrases could be tied to a Wikipedia page. We then built a Mturk task which asked turkers to find the Wikipedia page associated with a phrase. Finally, phrases that mapped to the same Wikipedia page were grouped together and the most frequent phrases was chosen as the label.
  5. Once we had a good skills master list, it was released and members were allowed to add skills on their profile, using a typeahead. Our goal though was to maximize the number of members with skills on LinkedIn so we looked for ways to suggest profile edits and designed a prompt that we named “suggested skills”. The user would be prompted whether they have these skills or not. This problem is quite similar to the discovery of latent attributes in profiles. In other words, you are inferring the attributes of an incomplete profile using the rest of the profile, or any other information available. Our goal was to have recommendations even if the user had no skills on his profile so the algorithm would have to be based on something else than previously added skills. Just recommending popular skills wouldn’t be very relevant either. Using the member’s network is a good idea but some members have small networks and our goal was to maximize coverage. Finally, we looked at using standardized profile attributes to bootstrap our inference algorithm
  6. Each profile is composed of text but also of standardized entities such as title, function, industry, field of study etc. The coverage between these various attributes vary. Some are very frequent such as industry and some are more rare (e.g. group membership). We identified all attributes that could be predictive in terms of skills.
  7. Our goal was then to model this problem and find a classification method to infer the likelihood a member has a skill. The number of features was quite large and needed a system that would easily scale. As mentioned, we don’t have a unique user similarity metric but instead a list of different profile attributes that, when shared can predict the likelihood of skills. Each member can have a different set of attributes. Some users have only an industry, others have multiple companies, multiple titles etc.
  8. What are the true positives and stuff