SlideShare a Scribd company logo
1 of 30
LEVERAGING
Dylan Hogg - Head of Data Science
MACHINE LEARNING
FOR COMPETITIVE
February 2016
ADVANTAGE
OUTLINE
• About me and Search Party
• Scalable Data Platform: Hadoop and Spark
• Machine Learning: What and Why?
• Two Different Applications of Machine Learning
2
ABOUT ME
• Head of Data Science at Search Party (3 years)
• Software Engineering, Database and BI experience
(15 years)
• Postgrad study in Machine Learning from UNSW (2012)
& B.Comm from Auckland University (1999)
3
ABOUT SEARCH PARTY
4
Recruitment Global
Marketplace
• A marketplace that connects
employers, recruiters and
candidates
• Offices in Australia, UK
and Canada
• Employers search millions
of anonymised candidates
and engage with selected
recruiters to fill a vacancy
SEARCH PARTY MISSION
Leverage cutting edge
technology to evolve recruitment
into an efficient digital process
that provides a better outcome
for all three parties involved -
recruiters, employers and job
seekers.
5
SCALABLE
DATA
PLATFORM
6
6
SEARCH PARTY DATA PLATFORM
7
15 million candidate
resumes
From over 800
recruiters
With over 80 million
role descriptions
Job titles
Job descriptions
Skills
Company names
Universities
Executive summaries
Qualifications
Industries
SCALABLE DATA PLATFORM
8
• Cloudera distribution of Hadoop hosted with Rackspace
• Adopted Spark in 2014 for fast data processing
• One platform for prototype and production algorithms
• Spark has Scala, Java, Python and R language support
MACHINE
LEARNING
9
9
(a branch of artificial intelligence)
WHAT IS MACHINE LEARNING?
10
• Origins in computer science
and statistics
• Algorithms that can learn a
specific task from data
without being explicitly
programmed
• Supervised learning to make
predictions
• Unsupervised learning to
discover patterns
WHY MACHINE LEARNING?
11
• Machine learning provides a
set of tools to extract value
and insights from data
• Creates a barrier to entry with
data driven products and
services
• With more data, machine
learning models can become
more accurate
12
MACHINE
LEARNING
Algorithm 1: Detecting
Duplicate candidates
DETECTING DUPLICATE CANDIDATES
– THE PROBLEM
13
• Why does Search Party need to detect duplicate
candidates?
– Candidates may be represented by multiple recruiters
– We resolve duplicate candidate resumes to a single
candidate in our marketplace
Joe Bloggs resume from Recruiter 1
Joe Bloggs resume from Recruiter 2
Joe Bloggs in the marketplace
DETECTING DUPLICATE CANDIDATES
– APPROACH
14
• The naïve implementation scales
poorly with candidate number
• Several versions of “smart”
implementations
• Final version: custom clustering
algorithm in Spark
– De-duplicate 15M candidates in 9 hours on 8 node cluster
– Algorithm considers full name, email, phone, skills and list of
employers
DETECTING DUPLICATE CANDIDATES
– EXAMPLE
15
Email Full Name Phone Skills Employers
j.blogs@gmail.com Joe Bloggs 0409 123 456 Budgeting
External Audit
Forecasting
General Ledger
Intuit Quicken
Wrigley Company Pty Ltd
PepsiCo Australia Holdings
Austbrokers Holdings Pty
jblogs@hotmail.com Joseph
Bloggs
0409 555 555 Accruals
External Audit
General Ledger
PepsiCo Australia Holdings
Austbrokers Holdings Pty
jblogs@hotmail.com Joe Bloggs +61
0409555555
Accruals
General Ledger
AUSTBROKERS
HOLDINGS PTY LTD
DETECTING DUPLICATE CANDIDATES
– ALGORTIHM
16
1. Tokenise candidate fields e.g. “Dylan” -> dy, yl, la, an
2. Compute vectors from tokens for each candidate field
(using TD-IDF)
3. Fast Canopy Clustering1 on candidate full name only
4. Slower Correlation Clustering2 on each canopy using all
candidate fields
5. Main platform uses results to resolve duplicates
1 McCallum et al (2000) "Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching”
2 Elsner et al (2009) "Bounding and Comparing Methods for Correlation Clustering Beyond ILP"
17
 Solves an important business problem of grouping
duplicate candidates in marketplace search results
 This ability is a barrier to entry for potential competitors
 Improves candidate reporting and analysis
 Coming soon: Allowing recruiters to de-duplicate their
CRM database
HOW DOES CANDIDATE “DEDUPE” GIVE
SEARCH PARTY A COMPETIVE ADVANTAGE?
DEEP
LEARNING
18
18
(a branch of machine learning)
WHAT IS DEEP LEARNING?
19
• Neural networks with many hidden
layers
• Use of GPU processors for
performance
• Requires lots of training data
• Slow to train, fast to make predictions
DEEP
LEARNING
Algorithm 2: Job Title
Resolution
UI
designer
UX
designer
User
Interface
designer
95%
80%
80%
20
21
JOB TITLE RESOLUTION – THE PROBLEM
• 6 million distinct job titles extracted
from resumes
• Most common 25 job titles cover
10% of records
• Want to resolve distinct titles that
make up the long tail
• E.g. “Chief Data Officer - Europe”
with “Interim CDO”
with “CDO - Financial Services”
22
JOB TITLE RESOLUTION – TRAINING DATA
• Training data from a partner who has millions of public
vacancies
• Training example: Raw text vacancy description with
corresponding job title
Vacancy Description
We are seeking a Chief
Data Officer who can
manage standard data
operating procedures,
data quality standards
as well as being able
to understand how to
combine different data
sources within the
organisation with each
other.
Chief Data Officer
23
JOB TITLE RESOLUTION – PREDICTION
Prediction Output: Probabilities for a list of job titles
Chief Data Officer - Asia
Probability Suggested Job Title
80% Chief Data Officer
75% CDO
60% Data Officer
50% Data Quality Officer
40% Senior Data Analyst
24
JOB TITLE RESOLUTION
– PREPARING INPUTS
• Neutral network input must be numerical vectors
representing words
• Create word vectors using Google’s word2vec1
algorithm
• Enables vector arithmetic on words:
1 Mikolov et al (2013) "Efficient Estimation of Word Representations in Vector Space”
King
Queen
+ Woman
- Man
Paris
Warsaw
+ Poland
- France
25
JOB TITLE RESOLUTION
– DEEP LEARNING NETWORK
• Training the network learns parameters that allows it to
to correctly make predictions
• Training uses a vacancy description and correct job title
as well as an incorrect job title
• Adapted into a ranking problem to try and rank the
correct job title above the incorrect one
26
JOB TITLE RESOLUTION
– EXAMPLES
Resolution handles abbreviations, re-orderings, synonyms
and misspellings.
Chief Information Officer - Europe
Chief Information Officer
Software engineer - contract
Software engineer
CIO
Interim Chief
Information Officer
Software Engineer
.NET
Software Engineer II
27
HOW DOES JOB TITLE RESOLUTION GIVE
SEARCH PARTY A COMPETITIVE ADVANTAGE?
 Improvements to our candidate search engine via
improved understanding of user search queries
 Increased chance of a successful candidate placement
for a vacancy
 Data cleansing enables linking records with external
data sources
SUMMARY
• Machine learning algorithms give Search Party the ability to
extract value from our data and build great data products
• These products provide a better outcome for all parties
involved - recruiters, employers and job seekers
• Our scalable machine learning algorithms increase the barrier
to entry for any potential competitors
• As our data grows, supervised machine learning algorithms
improve and this directly benefits our platform
28
KEY TAKEAWAY
Algorithms are as important as Data
and are a key competitive advantage for
your organisation
29
Questions?
Connect with me…
www.linkedin.com/in/dylanhogg
@dylanhogg
www.thesearchparty.com

More Related Content

Similar to Leveraging Machine Learning for Competitive Advantage at Search Party

LMFAO Leveraging Machines for Awesome Outreach
LMFAO  Leveraging Machines for Awesome OutreachLMFAO  Leveraging Machines for Awesome Outreach
LMFAO Leveraging Machines for Awesome OutreachGareth Simpson
 
Building Scalable ML Products by TripAdvisor PM & Data Scientist
Building Scalable ML Products by TripAdvisor PM & Data ScientistBuilding Scalable ML Products by TripAdvisor PM & Data Scientist
Building Scalable ML Products by TripAdvisor PM & Data ScientistProduct School
 
Practical model management in the age of Data science and ML
Practical model management in the age of Data science and MLPractical model management in the age of Data science and ML
Practical model management in the age of Data science and MLQuantUniversity
 
Hrm activities in a company
Hrm activities in a companyHrm activities in a company
Hrm activities in a companyrinuemma
 
SA 2014 - Integrating the heterogeneous enterprise
SA 2014 - Integrating the heterogeneous enterpriseSA 2014 - Integrating the heterogeneous enterprise
SA 2014 - Integrating the heterogeneous enterpriseDavid Graham
 
Delivering Machine Learning Solutions by fmr Sears Dir of PM
Delivering Machine Learning Solutions by fmr Sears Dir of PMDelivering Machine Learning Solutions by fmr Sears Dir of PM
Delivering Machine Learning Solutions by fmr Sears Dir of PMProduct School
 
Digicrome Data Science & AI 11 Month Course PDF.pdf
Digicrome Data Science & AI 11 Month Course PDF.pdfDigicrome Data Science & AI 11 Month Course PDF.pdf
Digicrome Data Science & AI 11 Month Course PDF.pdfitsmeankitkhan
 
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...Benjamin Le
 
Leveraging The Machine: The Future of People Data Is Now
Leveraging The Machine: The Future of People Data Is Now Leveraging The Machine: The Future of People Data Is Now
Leveraging The Machine: The Future of People Data Is Now Cornerstone OnDemand
 
Agile Mumbai 2022 - Sriya Khandelwal | Reimagining the Application of AI in L...
Agile Mumbai 2022 - Sriya Khandelwal | Reimagining the Application of AI in L...Agile Mumbai 2022 - Sriya Khandelwal | Reimagining the Application of AI in L...
Agile Mumbai 2022 - Sriya Khandelwal | Reimagining the Application of AI in L...AgileNetwork
 
Automated Placement System
Automated Placement SystemAutomated Placement System
Automated Placement SystemIRJET Journal
 
DutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive SectorDutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive SectorBigML, Inc
 
Everyday Data Science
Everyday Data ScienceEveryday Data Science
Everyday Data SciencePaul Laughlin
 
Muhammad Shafique CV for .NET Job
Muhammad Shafique CV for .NET JobMuhammad Shafique CV for .NET Job
Muhammad Shafique CV for .NET JobMuhammad Shafique
 
Lead confluent HQ Dec 2019
Lead   confluent HQ Dec 2019Lead   confluent HQ Dec 2019
Lead confluent HQ Dec 2019Sabri Skhiri
 
Google Cloud Machine Learning
 Google Cloud Machine Learning  Google Cloud Machine Learning
Google Cloud Machine Learning India Quotient
 
Lab 64 - Python Sktime for time series analysis in python with visualization ...
Lab 64 - Python Sktime for time series analysis in python with visualization ...Lab 64 - Python Sktime for time series analysis in python with visualization ...
Lab 64 - Python Sktime for time series analysis in python with visualization ...finalyearproject61
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab CreateTuri, Inc.
 

Similar to Leveraging Machine Learning for Competitive Advantage at Search Party (20)

LMFAO Leveraging Machines for Awesome Outreach
LMFAO  Leveraging Machines for Awesome OutreachLMFAO  Leveraging Machines for Awesome Outreach
LMFAO Leveraging Machines for Awesome Outreach
 
Building Scalable ML Products by TripAdvisor PM & Data Scientist
Building Scalable ML Products by TripAdvisor PM & Data ScientistBuilding Scalable ML Products by TripAdvisor PM & Data Scientist
Building Scalable ML Products by TripAdvisor PM & Data Scientist
 
Practical model management in the age of Data science and ML
Practical model management in the age of Data science and MLPractical model management in the age of Data science and ML
Practical model management in the age of Data science and ML
 
Ds for finance day 4
Ds for finance day 4Ds for finance day 4
Ds for finance day 4
 
Hrm activities in a company
Hrm activities in a companyHrm activities in a company
Hrm activities in a company
 
SA 2014 - Integrating the heterogeneous enterprise
SA 2014 - Integrating the heterogeneous enterpriseSA 2014 - Integrating the heterogeneous enterprise
SA 2014 - Integrating the heterogeneous enterprise
 
Wims2012
Wims2012Wims2012
Wims2012
 
Delivering Machine Learning Solutions by fmr Sears Dir of PM
Delivering Machine Learning Solutions by fmr Sears Dir of PMDelivering Machine Learning Solutions by fmr Sears Dir of PM
Delivering Machine Learning Solutions by fmr Sears Dir of PM
 
Digicrome Data Science & AI 11 Month Course PDF.pdf
Digicrome Data Science & AI 11 Month Course PDF.pdfDigicrome Data Science & AI 11 Month Course PDF.pdf
Digicrome Data Science & AI 11 Month Course PDF.pdf
 
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
 
Leveraging The Machine: The Future of People Data Is Now
Leveraging The Machine: The Future of People Data Is Now Leveraging The Machine: The Future of People Data Is Now
Leveraging The Machine: The Future of People Data Is Now
 
Agile Mumbai 2022 - Sriya Khandelwal | Reimagining the Application of AI in L...
Agile Mumbai 2022 - Sriya Khandelwal | Reimagining the Application of AI in L...Agile Mumbai 2022 - Sriya Khandelwal | Reimagining the Application of AI in L...
Agile Mumbai 2022 - Sriya Khandelwal | Reimagining the Application of AI in L...
 
Automated Placement System
Automated Placement SystemAutomated Placement System
Automated Placement System
 
DutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive SectorDutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive Sector
 
Everyday Data Science
Everyday Data ScienceEveryday Data Science
Everyday Data Science
 
Muhammad Shafique CV for .NET Job
Muhammad Shafique CV for .NET JobMuhammad Shafique CV for .NET Job
Muhammad Shafique CV for .NET Job
 
Lead confluent HQ Dec 2019
Lead   confluent HQ Dec 2019Lead   confluent HQ Dec 2019
Lead confluent HQ Dec 2019
 
Google Cloud Machine Learning
 Google Cloud Machine Learning  Google Cloud Machine Learning
Google Cloud Machine Learning
 
Lab 64 - Python Sktime for time series analysis in python with visualization ...
Lab 64 - Python Sktime for time series analysis in python with visualization ...Lab 64 - Python Sktime for time series analysis in python with visualization ...
Lab 64 - Python Sktime for time series analysis in python with visualization ...
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab Create
 

Recently uploaded

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Leveraging Machine Learning for Competitive Advantage at Search Party

  • 1. LEVERAGING Dylan Hogg - Head of Data Science MACHINE LEARNING FOR COMPETITIVE February 2016 ADVANTAGE
  • 2. OUTLINE • About me and Search Party • Scalable Data Platform: Hadoop and Spark • Machine Learning: What and Why? • Two Different Applications of Machine Learning 2
  • 3. ABOUT ME • Head of Data Science at Search Party (3 years) • Software Engineering, Database and BI experience (15 years) • Postgrad study in Machine Learning from UNSW (2012) & B.Comm from Auckland University (1999) 3
  • 4. ABOUT SEARCH PARTY 4 Recruitment Global Marketplace • A marketplace that connects employers, recruiters and candidates • Offices in Australia, UK and Canada • Employers search millions of anonymised candidates and engage with selected recruiters to fill a vacancy
  • 5. SEARCH PARTY MISSION Leverage cutting edge technology to evolve recruitment into an efficient digital process that provides a better outcome for all three parties involved - recruiters, employers and job seekers. 5
  • 7. SEARCH PARTY DATA PLATFORM 7 15 million candidate resumes From over 800 recruiters With over 80 million role descriptions Job titles Job descriptions Skills Company names Universities Executive summaries Qualifications Industries
  • 8. SCALABLE DATA PLATFORM 8 • Cloudera distribution of Hadoop hosted with Rackspace • Adopted Spark in 2014 for fast data processing • One platform for prototype and production algorithms • Spark has Scala, Java, Python and R language support
  • 9. MACHINE LEARNING 9 9 (a branch of artificial intelligence)
  • 10. WHAT IS MACHINE LEARNING? 10 • Origins in computer science and statistics • Algorithms that can learn a specific task from data without being explicitly programmed • Supervised learning to make predictions • Unsupervised learning to discover patterns
  • 11. WHY MACHINE LEARNING? 11 • Machine learning provides a set of tools to extract value and insights from data • Creates a barrier to entry with data driven products and services • With more data, machine learning models can become more accurate
  • 13. DETECTING DUPLICATE CANDIDATES – THE PROBLEM 13 • Why does Search Party need to detect duplicate candidates? – Candidates may be represented by multiple recruiters – We resolve duplicate candidate resumes to a single candidate in our marketplace Joe Bloggs resume from Recruiter 1 Joe Bloggs resume from Recruiter 2 Joe Bloggs in the marketplace
  • 14. DETECTING DUPLICATE CANDIDATES – APPROACH 14 • The naïve implementation scales poorly with candidate number • Several versions of “smart” implementations • Final version: custom clustering algorithm in Spark – De-duplicate 15M candidates in 9 hours on 8 node cluster – Algorithm considers full name, email, phone, skills and list of employers
  • 15. DETECTING DUPLICATE CANDIDATES – EXAMPLE 15 Email Full Name Phone Skills Employers j.blogs@gmail.com Joe Bloggs 0409 123 456 Budgeting External Audit Forecasting General Ledger Intuit Quicken Wrigley Company Pty Ltd PepsiCo Australia Holdings Austbrokers Holdings Pty jblogs@hotmail.com Joseph Bloggs 0409 555 555 Accruals External Audit General Ledger PepsiCo Australia Holdings Austbrokers Holdings Pty jblogs@hotmail.com Joe Bloggs +61 0409555555 Accruals General Ledger AUSTBROKERS HOLDINGS PTY LTD
  • 16. DETECTING DUPLICATE CANDIDATES – ALGORTIHM 16 1. Tokenise candidate fields e.g. “Dylan” -> dy, yl, la, an 2. Compute vectors from tokens for each candidate field (using TD-IDF) 3. Fast Canopy Clustering1 on candidate full name only 4. Slower Correlation Clustering2 on each canopy using all candidate fields 5. Main platform uses results to resolve duplicates 1 McCallum et al (2000) "Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching” 2 Elsner et al (2009) "Bounding and Comparing Methods for Correlation Clustering Beyond ILP"
  • 17. 17  Solves an important business problem of grouping duplicate candidates in marketplace search results  This ability is a barrier to entry for potential competitors  Improves candidate reporting and analysis  Coming soon: Allowing recruiters to de-duplicate their CRM database HOW DOES CANDIDATE “DEDUPE” GIVE SEARCH PARTY A COMPETIVE ADVANTAGE?
  • 18. DEEP LEARNING 18 18 (a branch of machine learning)
  • 19. WHAT IS DEEP LEARNING? 19 • Neural networks with many hidden layers • Use of GPU processors for performance • Requires lots of training data • Slow to train, fast to make predictions
  • 20. DEEP LEARNING Algorithm 2: Job Title Resolution UI designer UX designer User Interface designer 95% 80% 80% 20
  • 21. 21 JOB TITLE RESOLUTION – THE PROBLEM • 6 million distinct job titles extracted from resumes • Most common 25 job titles cover 10% of records • Want to resolve distinct titles that make up the long tail • E.g. “Chief Data Officer - Europe” with “Interim CDO” with “CDO - Financial Services”
  • 22. 22 JOB TITLE RESOLUTION – TRAINING DATA • Training data from a partner who has millions of public vacancies • Training example: Raw text vacancy description with corresponding job title Vacancy Description We are seeking a Chief Data Officer who can manage standard data operating procedures, data quality standards as well as being able to understand how to combine different data sources within the organisation with each other. Chief Data Officer
  • 23. 23 JOB TITLE RESOLUTION – PREDICTION Prediction Output: Probabilities for a list of job titles Chief Data Officer - Asia Probability Suggested Job Title 80% Chief Data Officer 75% CDO 60% Data Officer 50% Data Quality Officer 40% Senior Data Analyst
  • 24. 24 JOB TITLE RESOLUTION – PREPARING INPUTS • Neutral network input must be numerical vectors representing words • Create word vectors using Google’s word2vec1 algorithm • Enables vector arithmetic on words: 1 Mikolov et al (2013) "Efficient Estimation of Word Representations in Vector Space” King Queen + Woman - Man Paris Warsaw + Poland - France
  • 25. 25 JOB TITLE RESOLUTION – DEEP LEARNING NETWORK • Training the network learns parameters that allows it to to correctly make predictions • Training uses a vacancy description and correct job title as well as an incorrect job title • Adapted into a ranking problem to try and rank the correct job title above the incorrect one
  • 26. 26 JOB TITLE RESOLUTION – EXAMPLES Resolution handles abbreviations, re-orderings, synonyms and misspellings. Chief Information Officer - Europe Chief Information Officer Software engineer - contract Software engineer CIO Interim Chief Information Officer Software Engineer .NET Software Engineer II
  • 27. 27 HOW DOES JOB TITLE RESOLUTION GIVE SEARCH PARTY A COMPETITIVE ADVANTAGE?  Improvements to our candidate search engine via improved understanding of user search queries  Increased chance of a successful candidate placement for a vacancy  Data cleansing enables linking records with external data sources
  • 28. SUMMARY • Machine learning algorithms give Search Party the ability to extract value from our data and build great data products • These products provide a better outcome for all parties involved - recruiters, employers and job seekers • Our scalable machine learning algorithms increase the barrier to entry for any potential competitors • As our data grows, supervised machine learning algorithms improve and this directly benefits our platform 28
  • 29. KEY TAKEAWAY Algorithms are as important as Data and are a key competitive advantage for your organisation 29

Editor's Notes

  1. Final v2 10th Feb.
  2. Talk about team and Dr Jan Luts
  3. Airbnb, Uber
  4. Just think about what we do… we allow employers to…….
  5. We apply it in multipl …….extracted two simple aplications …to give examples of
  6. Why important…anecdote…
  7. Cf dupe customers in crm’s
  8. “Easy for brain hard for If then”