SlideShare une entreprise Scribd logo
1  sur  61
Using Deep Learning To
Predict Performance From
Resumes
Ben Taylor, Chief Data Scientist
INTRODUCTIONS
Ben Taylor @bentaylordata
Background Personal
• Sequoia Capital
• Largest Video Interviewing Platform
• Forbes #10 most promising companies
• Global: 189 countries
NATURAL LANGUAGE
PROCESSING (NLP)
GRIT MOTIVATION ENGAGEMENT PERFORMANCE
1 55 80 95%
0 75 10 22%
0 50 20 57%
1 20 90 91%
0 40 60 11%
Basic Tutorial On How To Build A Numeric Feature Model
BUILDING A MODEL
ESSAY GRIT MOTIVATION ENGAGEMENT PERFORMANCE
I want to work here 1 55 80 95%
I have great teamwork 0 75 10 22%
Synergy 0 50 20 57%
I have so much grit 1 20 90 91%
They fired that individual 0 40 60 11%
Now what?!?
BUILDING A MODEL
ESSAY PERFORMANCE
I want to work here 95%
I have great teamwork 22%
Synergy 57%
I have so much grit 91%
They fired that individual 11%
There are really two different options, mapping or tokenizing
BUILDING A MODEL
Map:
Bad = 0
Good = 1
Better = 2
Best = 3
Tokenize:
Female = 1
Male = 1
Female Male
1 0
0 1
I want to work here have great PERF.
1 1 1 1 1 0 0 95%
1 0 0 0 0 1 1 22%
0 0 0 0 0 0 0 57%
1 0 0 0 0 1 0 91%
0 0 0 0 0 0 0 11%
Tokenize the text into unique word columns
BUILDING A MODEL
ESSAY PERFORMANCE
I want to work here 95%
I have great teamwork 22%
Synergy 57%
I have so much grit 91%
They fired that individual 11%
I want to work here have great PERF.
1 1 1 1 1 0 0 95%
1 0 0 0 0 1 1 22%
0 0 0 0 0 0 0 57%
1 0 0 0 0 1 0 91%
0 0 0 0 0 0 0 11%
Bag of words modeling, sequence and ordering is lost
BUILDING A MODEL
Bag of words modeling, sequence and ordering is lost
BUILDING A MODEL
I want Want to to go work here PERF.
1 1 1 1 1 95%
1 0 0 0 0 22%
0 0 0 0 0 57%
1 0 0 0 0 91%
0 0 0 0 0 11%
Band-Aid: Concept of n-grams
BUILDING A MODEL
SENTIMENT EXAMPLE
(multiclass)
We need a labeled dataset, sometimes getting one with labels is the biggest challenge of all.
SENTIMENT DATASET, 1.5M TWEETS
label text
neg @Christian_Rocha i miss u!!!!!
pos @llanitos there's still some St Werburghs hone...
pos @Ashley96 it's me
neg @Phillykidd we use to be like bestfriends
neg Just got back from Manchester. I went to the T...
pos @LauraDark thnks x el rt
neg "Ughh it's so hot & the singing lady is st...
neg @hnprashanth @dkris I was out to my native for...
pos Girls night with the bests Wish you were here J!
neg Just watched @paulkehler rock the crap out of ...
pos i got the gurl! i got the ride! now im just on...
pos @ninthspace how is the table building going?
pos by d way guyz I must log out na see u again to...
neg @dreday11 its only 20 mins...
Sentiment140
cs.stanford.edu
:(:)
Before we can process this we need to do the proper formatting to get it ready
SENTIMENT DATASET - FORMATTING
text
@Christian_Rocha i miss u!!!!!
@llanitos there's still some St Werburghs hone...
@Ashley96 it's me
@Phillykidd we use to be like bestfriends
Just got back from Manchester. I went to the T...
@LauraDark thnks x el rt
"Ughh it's so hot & the singing lady is st...
@hnprashanth @dkris I was out to my native for...
Girls night with the bests Wish you were here J!
Just watched @paulkehler rock the crap out of ...
i got the gurl! i got the ride! now im just on...
@ninthspace how is the table building going?
by d way guyz I must log out na see u again to...
@dreday11 its only 20 mins...
Python list
Now we can go all the way to model training and prediction
SENTIMENT DATASET – UNIGRAM
y
[0,1,0,1,1]
text_data
[[‘this is a tweet’]
[‘sounds good’]
[‘not really’]]
I want to work here have great
1 1 1 1 1 0 0
1 0 0 0 0 1 1
0 0 0 0 0 0 0
1 0 0 0 0 1 0
0 0 0 0 0 0 0
Now we can go all the way to model training and prediction
SENTIMENT DATASET – BIGRAM
I want Want to to go work here
1 1 1 1 1
1 0 0 0 0
0 0 0 0 0
1 0 0 0 0
0 0 0 0 0
text_data
[[‘this is a tweet’]
[‘sounds good’]
[‘not really’]]
y
[0,1,0,1,1]
BUILDING A MODEL
Convert labels to integers
SENTIMENT DATASET - FORMATTING
Python int array
label
neg
pos
pos
neg
neg
pos
neg
neg
pos
neg
pos
pos
pos
neg
Convert labels to integers
SENTIMENT DATASET - FORMATTING
model.fit(X,Y)
X
[4,0,0,0,0,7,0,0,1]
[0,0,0,0,9,0,0,0,2]
Now we can go all the way to model training and prediction
SENTIMENT DATASET – BUILD A MODEL
y
[0,1,0,1,1]
X
[4,0,0,0,0,7,0,0,1]
[0,0,0,0,9,0,0,0,2]
PERFORMANCE?
DON’T CHEAT!
PROPER MODEL VALIDATION
We need to hold out data we can test against, this is called your validation set
SENTIMENT DATASET – VALIDATION
Train on 20%, test on 80%
SENTIMENT DATASET – VALIDATION
20% 80%
Best score yet
SENTIMENT DATASET – VALIDATION
60% 40%
Best score yet
SENTIMENT DATASET – VALIDATION
70% 30%
Best score yet
SENTIMENT DATASET – VALIDATION
80% 20%
Best score yet
SENTIMENT DATASET – VALIDATION
99% 1%
Perfect scores
SENTIMENT DATASET – VALIDATION
99.9999% 2
Predict Every Point, k-folding
Folds = 9 Fold = 1 Fold = 2… Y_pred
SENTIMENT DATASET – Validation
10 folds
SENTIMENT DATASET – Validation
100 folds
BIGRAM BOOST
acc: 0.8015
r: 0.2061
AUROC: 0.8738
acc: 0.7809
r: 0.1238
AUROC: 0.8554
Feature Creation
Model Selection
Feature Reduction
BETTER MODELS
acc: 0.8208
r: 0.2832
AUROC: 0.8939
acc: 0.8015
r: 0.2061
AUROC: 0.8739
Was:
Now: (10x average)
EMAIL CLASSIFICATION
(multiclass)
EMAIL MULTICLASS DATASET (20 classes)
alt.atheism
comp.graphics
comp.os.ms-windows.misc
comp.sys.ibm.pc.hardware
comp.sys.mac.hardware
comp.windows.x
misc.forsale
rec.autos
rec.motorcycles
rec.sport.baseball
rec.sport.hockey
sci.crypt
sci.electronics
sci.med
sci.space
soc.religion.christian
talk.politics.guns
talk.politics.mideast
talk.politics.misc
talk.religion.misc
EMAIL MULTICLASS DATASET (20 classes)
From: lerxst@wam.umd.edu (where's my thing)
Subject: WHAT car is this!?
Nntp-Posting-Host: rac3.wam.umd.edu
Organization: University of Maryland, College Park
Lines: 15
MSG: I was wondering if anyone out there could enlighten me on this car I sawnthe other day. It was a 2-door
sports car, looked to be from the late 60s/nearly 70s. It was called a Bricklin. The doors were really small. In
addition,nthe front bumper was separate from the rest of the body. This is nall I know. If anyone can tellme a
model name, engine specs, yearsnof production, where this car is made, history, or whatever info younhave on
this funky looking car, please e-mail.nnThanks,n- ILn ---- brought to you by your neighborhood Lerxst ----
nnnnn"
rec.autos
EMAIL MULTICLASS DATASET (20 classes)
From: guykuo@carson.u.washington.edu (Guy Kuo)
Subject: SI Clock Poll - Final Call
Summary: Final call for SI clock reports
Keywords: SI,acceleration,clock,upgrade
Article-I.D.: shelley.1qvfo9INNc3s
Organization: University of Washington
Lines: 11
NNTP-Posting-Host: carson.u.washington.edu
MSG: A fair number of brave souls who upgraded their SI clock oscillator havenshared their experiences for
this poll. Please send a brief message detailingnyour experiences with the procedure. Top speed attained, CPU
rated speed,nadd on cards and adapters, heat sinks, hour of usage per day, floppy disknfunctionality with 800
and 1.4 m floppies are especially requested.nnI will be summarizing in the next two days, so please add to the
networknknowledge base if you have done the clock upgrade and haven't answered thisnpoll. Thanks.nnGuy
Kuo <guykuo@u.washington.edu>n"
comp.sys.mac.hardware
EMAIL MULTICLASS DATASET (20 classes)
From: jgreen@amber (Joe Green)
Subject: Re: Weitek P9000 ?
Organization: Harris Computer Systems Division
Lines: 14
Distribution: world
NNTP-Posting-Host: amber.ssd.csd.harris.com
X-Newsreader: TIN [version 1.1 PL9]
MSG: Robert J.C. Kyanko (rob@rjck.UUCP) wrote:n> abraxis@iastate.edu writes in article
<abraxis.734340159@class1.iastate.edu>:n> > Anyone know about the Weitek P9000 graphics chip?n> As far
as the low-level stuff goes, it looks pretty nice. It's got thisn> quadrilateral fill command that requires just the
four points.nnDo you have Weitek's address/phone number? I'd like to get some informationnabout this
chip.nn--nJoe GreenttttHarris Corporationnjgreen@csd.harris.comtttComputer Systems Divisionn"The
only thing that really scares me is a person with no sense of humor."ntttttt-- Jonathan Wintersn’
comp.graphics
EMAIL MULTICLASS DATASET (20 classes)
RESUME MODELING
(binary)
Upload Your
Resume
Now painstakingly fill out
this form containing all of
the exact same information
Document modeling review
UNSTRUCTURED
STRUCTURED
MUNGED
Resume Extension
Resume format consolidation
GPA Inclusion (18%)
GPA Replacement
Mimicking the human recruiter
Feature Hunt
ONE FEATURE AT A TIME
INCREMENTAL GAINS
DEEP LEARNING
Unstructured
ENGINEERS AND MANUAL FEATURES ARE EXPENSIVE, USING DEEP LEARNING TO AUTOMATE
AUTOMATIC FEATURE GENERATION
Structured
I want Want to to go work here PERF.
1 1 1 1 1 95%
1 0 0 0 0 22%
0 0 0 0 0 57%
1 0 0 0 0 91%
0 0 0 0 0 11%
ESSAY
I want to work here
I have great teamwork
Synergy
I have so much grit
They fired that individual
ENGINEERS AND MANUAL FEATURES ARE EXPENSIVE, USING DEEP LEARNING TO AUTOMATE
AUTOMATIC FEATURE GENERATION
ESSAY
I want to work here
I have great teamwork
Synergy
I have so much grit
They fired that individual
ESSAY
3 2 1 4 5
3 7 67 345
54
3 7 99 10234
78 203 501 14
1 2 3 4 5
0 0 0 1 0
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
LSTM
RAW TEXT WORD SEQUENCE
ENCODING
AUTOMATIC FEATURE GENERATION
AUTOMATIC FEATURE GENERATION
AUTOMATIC FEATURE GENERATION
BEGIN SCRATCHING AT LAYOUT
AUTOMATIC FEATURE GENERATION (LAYOUT)
CNN:
bit.ly/pacon
INTERVIEW MODELING
59
Would you ever hire from just a resume?
INTERVIEW MODELING
SOFT/TECHNICAL COMPETENCIES
Resume can overstate and understate
Audio VideoText
QUESTIONS

Contenu connexe

Similaire à Deep Learning Predicts Performance From Resumes

Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.pptRahulTr22
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .pptGanesh E
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.pptkalai75
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.pptAravind Reddy
 
7 Dangerous Myths DBAs Believe about Data Modeling
7 Dangerous Myths DBAs Believe about Data Modeling7 Dangerous Myths DBAs Believe about Data Modeling
7 Dangerous Myths DBAs Believe about Data ModelingEmbarcadero Technologies
 
Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Alexey Grigorev
 
Sww 2006 Redesigning Processes For Solid Works
Sww 2006   Redesigning Processes For Solid WorksSww 2006   Redesigning Processes For Solid Works
Sww 2006 Redesigning Processes For Solid WorksRazorleaf Corporation
 
Introducing HOSTING Labs - Ed Schaefer
Introducing HOSTING Labs - Ed Schaefer Introducing HOSTING Labs - Ed Schaefer
Introducing HOSTING Labs - Ed Schaefer Hostway|HOSTING
 
Excel Power-ups for Going Beast-mode in Local SEO
Excel Power-ups for Going Beast-mode in Local SEOExcel Power-ups for Going Beast-mode in Local SEO
Excel Power-ups for Going Beast-mode in Local SEODavid Minchala
 
Approaching (almost) Any NLP Problem
Approaching (almost) Any NLP ProblemApproaching (almost) Any NLP Problem
Approaching (almost) Any NLP ProblemAbhishek Thakur
 
Is your excel production code?
Is your excel production code?Is your excel production code?
Is your excel production code?ProCogia
 
Nagios Conference 2014 - David Josephsen - Graphing Nagios
Nagios Conference 2014 - David Josephsen - Graphing NagiosNagios Conference 2014 - David Josephsen - Graphing Nagios
Nagios Conference 2014 - David Josephsen - Graphing NagiosNagios
 
Machine Learning in Marketing - Jim Sterne @ Digital Analytics Forum 2018
Machine Learning in Marketing - Jim Sterne @ Digital Analytics Forum 2018Machine Learning in Marketing - Jim Sterne @ Digital Analytics Forum 2018
Machine Learning in Marketing - Jim Sterne @ Digital Analytics Forum 2018AT Internet
 
Design Patterns - IA Summit 2006
Design Patterns - IA Summit 2006Design Patterns - IA Summit 2006
Design Patterns - IA Summit 2006Jamie Reffell
 
Pandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data SciencePandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data ScienceKrishna Sankar
 
Practical Tips to Identify and Engage Talent
Practical Tips to Identify and Engage TalentPractical Tips to Identify and Engage Talent
Practical Tips to Identify and Engage TalentRecruitDC
 

Similaire à Deep Learning Predicts Performance From Resumes (20)

Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
7 Dangerous Myths DBAs Believe about Data Modeling
7 Dangerous Myths DBAs Believe about Data Modeling7 Dangerous Myths DBAs Believe about Data Modeling
7 Dangerous Myths DBAs Believe about Data Modeling
 
Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)
 
Sww 2006 Redesigning Processes For Solid Works
Sww 2006   Redesigning Processes For Solid WorksSww 2006   Redesigning Processes For Solid Works
Sww 2006 Redesigning Processes For Solid Works
 
Introducing HOSTING Labs - Ed Schaefer
Introducing HOSTING Labs - Ed Schaefer Introducing HOSTING Labs - Ed Schaefer
Introducing HOSTING Labs - Ed Schaefer
 
Excel Power-ups for Going Beast-mode in Local SEO
Excel Power-ups for Going Beast-mode in Local SEOExcel Power-ups for Going Beast-mode in Local SEO
Excel Power-ups for Going Beast-mode in Local SEO
 
Use atomic design ftw
Use atomic design ftwUse atomic design ftw
Use atomic design ftw
 
Machine Learning for dummies!
Machine Learning for dummies!Machine Learning for dummies!
Machine Learning for dummies!
 
Approaching (almost) Any NLP Problem
Approaching (almost) Any NLP ProblemApproaching (almost) Any NLP Problem
Approaching (almost) Any NLP Problem
 
Progressing and enhancing
Progressing and enhancingProgressing and enhancing
Progressing and enhancing
 
Is your excel production code?
Is your excel production code?Is your excel production code?
Is your excel production code?
 
Nagios Conference 2014 - David Josephsen - Graphing Nagios
Nagios Conference 2014 - David Josephsen - Graphing NagiosNagios Conference 2014 - David Josephsen - Graphing Nagios
Nagios Conference 2014 - David Josephsen - Graphing Nagios
 
Machine Learning in Marketing - Jim Sterne @ Digital Analytics Forum 2018
Machine Learning in Marketing - Jim Sterne @ Digital Analytics Forum 2018Machine Learning in Marketing - Jim Sterne @ Digital Analytics Forum 2018
Machine Learning in Marketing - Jim Sterne @ Digital Analytics Forum 2018
 
Test Automation Day 2018
Test Automation Day 2018Test Automation Day 2018
Test Automation Day 2018
 
Design Patterns - IA Summit 2006
Design Patterns - IA Summit 2006Design Patterns - IA Summit 2006
Design Patterns - IA Summit 2006
 
Pandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data SciencePandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data Science
 
Practical Tips to Identify and Engage Talent
Practical Tips to Identify and Engage TalentPractical Tips to Identify and Engage Talent
Practical Tips to Identify and Engage Talent
 

Plus de Benjamin Taylor

#SIOP15 Presentation On Performance Sorting Using Video Interviews
#SIOP15 Presentation On Performance Sorting Using Video Interviews#SIOP15 Presentation On Performance Sorting Using Video Interviews
#SIOP15 Presentation On Performance Sorting Using Video InterviewsBenjamin Taylor
 
#SIOP15 Presentation on
#SIOP15 Presentation on #SIOP15 Presentation on
#SIOP15 Presentation on Benjamin Taylor
 
How To Model Text Like A Rockstar
How To Model Text Like A RockstarHow To Model Text Like A Rockstar
How To Model Text Like A RockstarBenjamin Taylor
 
Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Benjamin Taylor
 
How to simulate semiconductor yield
How to simulate semiconductor yieldHow to simulate semiconductor yield
How to simulate semiconductor yieldBenjamin Taylor
 
Utah, the greatest SMOG on earth. Harvesting data for air quality prediction
Utah, the greatest SMOG on earth. Harvesting data for air quality predictionUtah, the greatest SMOG on earth. Harvesting data for air quality prediction
Utah, the greatest SMOG on earth. Harvesting data for air quality predictionBenjamin Taylor
 

Plus de Benjamin Taylor (10)

Deep learning for_devs
Deep learning for_devsDeep learning for_devs
Deep learning for_devs
 
Python genetics
Python geneticsPython genetics
Python genetics
 
Homeless story
Homeless storyHomeless story
Homeless story
 
#SIOP15 Presentation On Performance Sorting Using Video Interviews
#SIOP15 Presentation On Performance Sorting Using Video Interviews#SIOP15 Presentation On Performance Sorting Using Video Interviews
#SIOP15 Presentation On Performance Sorting Using Video Interviews
 
#SIOP15 Presentation on
#SIOP15 Presentation on #SIOP15 Presentation on
#SIOP15 Presentation on
 
How To Model Text Like A Rockstar
How To Model Text Like A RockstarHow To Model Text Like A Rockstar
How To Model Text Like A Rockstar
 
Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Predictive analytics and big data tutorial
Predictive analytics and big data tutorial
 
How to simulate semiconductor yield
How to simulate semiconductor yieldHow to simulate semiconductor yield
How to simulate semiconductor yield
 
Text analytics intro
Text analytics introText analytics intro
Text analytics intro
 
Utah, the greatest SMOG on earth. Harvesting data for air quality prediction
Utah, the greatest SMOG on earth. Harvesting data for air quality predictionUtah, the greatest SMOG on earth. Harvesting data for air quality prediction
Utah, the greatest SMOG on earth. Harvesting data for air quality prediction
 

Dernier

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 

Dernier (20)

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 

Deep Learning Predicts Performance From Resumes

  • 1. Using Deep Learning To Predict Performance From Resumes Ben Taylor, Chief Data Scientist
  • 4. • Sequoia Capital • Largest Video Interviewing Platform • Forbes #10 most promising companies • Global: 189 countries
  • 6. GRIT MOTIVATION ENGAGEMENT PERFORMANCE 1 55 80 95% 0 75 10 22% 0 50 20 57% 1 20 90 91% 0 40 60 11% Basic Tutorial On How To Build A Numeric Feature Model BUILDING A MODEL
  • 7. ESSAY GRIT MOTIVATION ENGAGEMENT PERFORMANCE I want to work here 1 55 80 95% I have great teamwork 0 75 10 22% Synergy 0 50 20 57% I have so much grit 1 20 90 91% They fired that individual 0 40 60 11% Now what?!? BUILDING A MODEL
  • 8. ESSAY PERFORMANCE I want to work here 95% I have great teamwork 22% Synergy 57% I have so much grit 91% They fired that individual 11% There are really two different options, mapping or tokenizing BUILDING A MODEL Map: Bad = 0 Good = 1 Better = 2 Best = 3 Tokenize: Female = 1 Male = 1 Female Male 1 0 0 1
  • 9. I want to work here have great PERF. 1 1 1 1 1 0 0 95% 1 0 0 0 0 1 1 22% 0 0 0 0 0 0 0 57% 1 0 0 0 0 1 0 91% 0 0 0 0 0 0 0 11% Tokenize the text into unique word columns BUILDING A MODEL ESSAY PERFORMANCE I want to work here 95% I have great teamwork 22% Synergy 57% I have so much grit 91% They fired that individual 11%
  • 10. I want to work here have great PERF. 1 1 1 1 1 0 0 95% 1 0 0 0 0 1 1 22% 0 0 0 0 0 0 0 57% 1 0 0 0 0 1 0 91% 0 0 0 0 0 0 0 11% Bag of words modeling, sequence and ordering is lost BUILDING A MODEL
  • 11. Bag of words modeling, sequence and ordering is lost BUILDING A MODEL
  • 12. I want Want to to go work here PERF. 1 1 1 1 1 95% 1 0 0 0 0 22% 0 0 0 0 0 57% 1 0 0 0 0 91% 0 0 0 0 0 11% Band-Aid: Concept of n-grams BUILDING A MODEL
  • 14. We need a labeled dataset, sometimes getting one with labels is the biggest challenge of all. SENTIMENT DATASET, 1.5M TWEETS label text neg @Christian_Rocha i miss u!!!!! pos @llanitos there's still some St Werburghs hone... pos @Ashley96 it's me neg @Phillykidd we use to be like bestfriends neg Just got back from Manchester. I went to the T... pos @LauraDark thnks x el rt neg "Ughh it's so hot &amp; the singing lady is st... neg @hnprashanth @dkris I was out to my native for... pos Girls night with the bests Wish you were here J! neg Just watched @paulkehler rock the crap out of ... pos i got the gurl! i got the ride! now im just on... pos @ninthspace how is the table building going? pos by d way guyz I must log out na see u again to... neg @dreday11 its only 20 mins... Sentiment140 cs.stanford.edu :(:)
  • 15. Before we can process this we need to do the proper formatting to get it ready SENTIMENT DATASET - FORMATTING text @Christian_Rocha i miss u!!!!! @llanitos there's still some St Werburghs hone... @Ashley96 it's me @Phillykidd we use to be like bestfriends Just got back from Manchester. I went to the T... @LauraDark thnks x el rt "Ughh it's so hot &amp; the singing lady is st... @hnprashanth @dkris I was out to my native for... Girls night with the bests Wish you were here J! Just watched @paulkehler rock the crap out of ... i got the gurl! i got the ride! now im just on... @ninthspace how is the table building going? by d way guyz I must log out na see u again to... @dreday11 its only 20 mins... Python list
  • 16. Now we can go all the way to model training and prediction SENTIMENT DATASET – UNIGRAM y [0,1,0,1,1] text_data [[‘this is a tweet’] [‘sounds good’] [‘not really’]] I want to work here have great 1 1 1 1 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0
  • 17. Now we can go all the way to model training and prediction SENTIMENT DATASET – BIGRAM I want Want to to go work here 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 text_data [[‘this is a tweet’] [‘sounds good’] [‘not really’]] y [0,1,0,1,1]
  • 19. Convert labels to integers SENTIMENT DATASET - FORMATTING Python int array label neg pos pos neg neg pos neg neg pos neg pos pos pos neg
  • 20. Convert labels to integers SENTIMENT DATASET - FORMATTING model.fit(X,Y) X [4,0,0,0,0,7,0,0,1] [0,0,0,0,9,0,0,0,2]
  • 21. Now we can go all the way to model training and prediction SENTIMENT DATASET – BUILD A MODEL y [0,1,0,1,1] X [4,0,0,0,0,7,0,0,1] [0,0,0,0,9,0,0,0,2] PERFORMANCE?
  • 24. We need to hold out data we can test against, this is called your validation set SENTIMENT DATASET – VALIDATION
  • 25. Train on 20%, test on 80% SENTIMENT DATASET – VALIDATION 20% 80%
  • 26. Best score yet SENTIMENT DATASET – VALIDATION 60% 40%
  • 27. Best score yet SENTIMENT DATASET – VALIDATION 70% 30%
  • 28. Best score yet SENTIMENT DATASET – VALIDATION 80% 20%
  • 29. Best score yet SENTIMENT DATASET – VALIDATION 99% 1%
  • 30. Perfect scores SENTIMENT DATASET – VALIDATION 99.9999% 2
  • 31. Predict Every Point, k-folding Folds = 9 Fold = 1 Fold = 2… Y_pred
  • 32. SENTIMENT DATASET – Validation 10 folds
  • 33. SENTIMENT DATASET – Validation 100 folds
  • 34. BIGRAM BOOST acc: 0.8015 r: 0.2061 AUROC: 0.8738 acc: 0.7809 r: 0.1238 AUROC: 0.8554
  • 36. BETTER MODELS acc: 0.8208 r: 0.2832 AUROC: 0.8939 acc: 0.8015 r: 0.2061 AUROC: 0.8739 Was: Now: (10x average)
  • 38. EMAIL MULTICLASS DATASET (20 classes) alt.atheism comp.graphics comp.os.ms-windows.misc comp.sys.ibm.pc.hardware comp.sys.mac.hardware comp.windows.x misc.forsale rec.autos rec.motorcycles rec.sport.baseball rec.sport.hockey sci.crypt sci.electronics sci.med sci.space soc.religion.christian talk.politics.guns talk.politics.mideast talk.politics.misc talk.religion.misc
  • 39. EMAIL MULTICLASS DATASET (20 classes) From: lerxst@wam.umd.edu (where's my thing) Subject: WHAT car is this!? Nntp-Posting-Host: rac3.wam.umd.edu Organization: University of Maryland, College Park Lines: 15 MSG: I was wondering if anyone out there could enlighten me on this car I sawnthe other day. It was a 2-door sports car, looked to be from the late 60s/nearly 70s. It was called a Bricklin. The doors were really small. In addition,nthe front bumper was separate from the rest of the body. This is nall I know. If anyone can tellme a model name, engine specs, yearsnof production, where this car is made, history, or whatever info younhave on this funky looking car, please e-mail.nnThanks,n- ILn ---- brought to you by your neighborhood Lerxst ---- nnnnn" rec.autos
  • 40. EMAIL MULTICLASS DATASET (20 classes) From: guykuo@carson.u.washington.edu (Guy Kuo) Subject: SI Clock Poll - Final Call Summary: Final call for SI clock reports Keywords: SI,acceleration,clock,upgrade Article-I.D.: shelley.1qvfo9INNc3s Organization: University of Washington Lines: 11 NNTP-Posting-Host: carson.u.washington.edu MSG: A fair number of brave souls who upgraded their SI clock oscillator havenshared their experiences for this poll. Please send a brief message detailingnyour experiences with the procedure. Top speed attained, CPU rated speed,nadd on cards and adapters, heat sinks, hour of usage per day, floppy disknfunctionality with 800 and 1.4 m floppies are especially requested.nnI will be summarizing in the next two days, so please add to the networknknowledge base if you have done the clock upgrade and haven't answered thisnpoll. Thanks.nnGuy Kuo <guykuo@u.washington.edu>n" comp.sys.mac.hardware
  • 41. EMAIL MULTICLASS DATASET (20 classes) From: jgreen@amber (Joe Green) Subject: Re: Weitek P9000 ? Organization: Harris Computer Systems Division Lines: 14 Distribution: world NNTP-Posting-Host: amber.ssd.csd.harris.com X-Newsreader: TIN [version 1.1 PL9] MSG: Robert J.C. Kyanko (rob@rjck.UUCP) wrote:n> abraxis@iastate.edu writes in article <abraxis.734340159@class1.iastate.edu>:n> > Anyone know about the Weitek P9000 graphics chip?n> As far as the low-level stuff goes, it looks pretty nice. It's got thisn> quadrilateral fill command that requires just the four points.nnDo you have Weitek's address/phone number? I'd like to get some informationnabout this chip.nn--nJoe GreenttttHarris Corporationnjgreen@csd.harris.comtttComputer Systems Divisionn"The only thing that really scares me is a person with no sense of humor."ntttttt-- Jonathan Wintersn’ comp.graphics
  • 42. EMAIL MULTICLASS DATASET (20 classes)
  • 44. Upload Your Resume Now painstakingly fill out this form containing all of the exact same information
  • 50. Mimicking the human recruiter Feature Hunt ONE FEATURE AT A TIME INCREMENTAL GAINS
  • 52. Unstructured ENGINEERS AND MANUAL FEATURES ARE EXPENSIVE, USING DEEP LEARNING TO AUTOMATE AUTOMATIC FEATURE GENERATION Structured I want Want to to go work here PERF. 1 1 1 1 1 95% 1 0 0 0 0 22% 0 0 0 0 0 57% 1 0 0 0 0 91% 0 0 0 0 0 11% ESSAY I want to work here I have great teamwork Synergy I have so much grit They fired that individual
  • 53. ENGINEERS AND MANUAL FEATURES ARE EXPENSIVE, USING DEEP LEARNING TO AUTOMATE AUTOMATIC FEATURE GENERATION ESSAY I want to work here I have great teamwork Synergy I have so much grit They fired that individual ESSAY 3 2 1 4 5 3 7 67 345 54 3 7 99 10234 78 203 501 14 1 2 3 4 5 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 LSTM RAW TEXT WORD SEQUENCE ENCODING
  • 57. BEGIN SCRATCHING AT LAYOUT AUTOMATIC FEATURE GENERATION (LAYOUT) CNN: bit.ly/pacon
  • 59. 59 Would you ever hire from just a resume? INTERVIEW MODELING SOFT/TECHNICAL COMPETENCIES Resume can overstate and understate

Notes de l'éditeur

  1. My name is Ben Taylor, I’m the Chief Data Scientist for a great startup called HireVue. Today I will be talking about NLP as well as deep learning. This talk is meant to be an introductory to those who are less familiar with NLP
  2. HR has seen a great cross pollination from other industries, I am an example of that. I studied chemical engineering through both undergrad and graduate programs. I then went and worked as a quant for a Manhattan hedge fund manager on a 600 GPU cluster. And… now I’m in HR. Oh, and I LOVE love love backcountry snowboarding. I took this photo last week and I go 2-3 times a week before work. I will never work anywhere besides Utah because of this.
  3. What is HireVue? We are a digital interviewing & interaction company We are backed by Sequoia Capital In 2014 we were # 10 for Forebes most promising companies Global, supporting digital interviews in 189 countries
  4. Building predictive models from competencies or other numeric features is straight forward. You take the columns or features of interest on the left, and the performance labels on the right and you pass them through a type of regression. Excel will do this, many programs will do this just fine. If you are MORE advanced you can use programs like R, python to do more advanced regressions like random forest, gradient boosting regression, or other… Raise your hand if you know how to build a model from this data?
  5. Now to throw a wrench in your process, I have decided to inject open ended essay response into my feature set. Raise your hand again if you know how/what to build a predictive model with this? Most classical statistician/mathematicians/analysis are justifiabilty confused by this is
  6. Like most data science or machine learning tricks, once they are explained at a 5th grade level, we tend to be underwhelmed. The computer can’t understand the raw text in its native format, it must convert them to numbers. One way to accomplish this is to map the text to numeric replacements. Good, better, best, can become 1,2, and 3. What would you do if you had something like male or female? You can map these, because if you made the male 2 and the female 1 are you being sexist? The are completely different, they can’t be directly compared. Therefore they must be tokenized where each column now represents the variable, so columns are created
  7. In the case of text you can have a LOT of columns. In some cases you may exceed 10,000, 100,000, or even 10M columns. Imagine attempting to open a dataset like this in excel, with over 1M columns. You have to use special software in R or python that can handle these types of data objects in a compressed sparse format. Can anyone see what the problems are with this approach? There is a major drawback. [sequence loss[
  8. Bag of words! This is called bag of words because you can visualize the words as if they are picked up by a paper bag. All sequence and ordering is lost. Is that a problem? Maybe.
  9. I analyzed some twitter data for Skullcandy a few years ago. When we presented our results to the engineering team we asked them “If someone says the F word in a tweet and tags your company… is that a bad thing?”. Think about it, for most of us in the room, with the companies we work for and represent does that give you anxiety thinking about that? The reason that gives us anxiety is because we know that would be a terrible thing and it would be really bad. Skullcandy knew their customer base well enough, they said they were sure. And sure enough the data showed that half of the people saying the F word on twitter and tagging Skullcandy said nice things, and the other have said mean things. So a word that is typically polarizing had no impact. Bad.... Bad is a bad word Ass.... Ass is a bad word But... If I say “bad ass” my bag of words method is going to see that as a very very bad thing, when in fact is is a very nice thing. How do we fix that?
  10. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  11. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  12. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  13. Excel 2002: 256 columns Excel 2003: 256 columns Excel 2016: 1,048,576 rows by 16,384 columns
  14. Excel 2002: 256 columns Excel 2003: 256 columns Excel 2016: 1,048,576 rows by 16,384 columns
  15. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  16. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  17. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  18. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  19. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  20. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  21. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  22. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  23. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  24. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  25. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  26. One point to bring up is proper model validation. I used to be confused when someone said train on 70, test on 30. Or train on 80 test on 20. Who was right? The answer I have settled on now is their of them are right. Explain the conflict.
  27. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  28. Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs. We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns. Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
  29. Now that we have some basic NLP background we will change gears to RESUME modeling. Who hates looking through stacks of resumes? Not sure, sometimes it can be fun, but depending on the stack size you might be spending 30 seconds on a resume, 7 seconds? How quickly can you screen a resume? Think about what you are doing when you screen a resume? Where you are your eyes looking? School GPA Skills Work history Name? Hopefully you don’t look at name.
  30. To review possible flow using NLP first we have an unstructured resume, we are forced to structure it somehow, then tokenize or munge the data into numeric
  31. Sometimes we can predict things without opening up the resume. Checkout these file extensions. It is hard to see, but statistically someone who uploads a DOC resume is be more likely to interview well than someone who does RTF. Likewise DOCX beats DOC And PDF beats DOCX
  32. What do we do with ALL of these formats? DOCX, txt, pdf? This is actually a big problem, we can’t do anything cool until we standardize the formats. Luckily there is a free open source office platform that can do the conversion for us. I recommend converting it to either txt or html.
  33. Now that we have text we can write specific feature grabbers like GPA. For the resumes we analyzed we noticed that GPAs were only included 1/5 resumes. Also this is where the distributions fell, not very many below 3.00 GPA are reporting. What do you do if someone does not include a GPA? When a feature is missing you MUST replace. Do you replace the GPA with a 0? That’s harsh, a 2.0? 4.0? Average? It depends
  34. Testing prediction quality we found that optimal prediction quality comes when we replace the GPA with a 3.6 What does that mean? That means if you have less than a 3.6 GPA, as far as the computer is concerned, including it doesn’t help you.
  35. There are so many features to create in the case of a resume model, you can save yourself a lot of time using a resume parsing service. The majority of the value comes from BOW. You quickly begin approaching incremental returns where a LOT of effort results in marginal gain. Malicious resume
  36. The biggest value that deep learning offers is automatic feature value discovery. This has been incredibly valuable with image, hitting new high points. It can also be valuable for text, allowing you to forget the concept of a tuple or n-gram.
  37. In the end the computer always needs a number, but in this case it is looking at very large sequences of numbers (100-300 word windows). Run it on entire resume: What is the prediction?
  38. In the end the computer always needs a number, but in this case it is looking at very large sequences of numbers (100-300 word windows). Run it on entire resume: What is the prediction?
  39. In the end the computer always needs a number, but in this case it is looking at very large sequences of numbers (100-300 word windows). Run it on entire resume: What is the prediction?
  40. In the end the computer always needs a number, but in this case it is looking at very large sequences of numbers (100-300 word windows). Run it on entire resume: What is the prediction?
  41. Fun tangent, does resume formatting matter? Margins? Font size, layout?
  42. Would you ever hire from just a resume? Why not?
  43. For interview modeling we use spoken text, which is more difficult because of the transcription accuracies. Raw audio (utterence, repetition) Video, micro expression (Lie to me)