Master tutorial on resume modeling given at SIOP 2016 in California. Please let me know if you have any questions on this topic. Using NLP can be very powerful for predicting candidate performance but it can also be dangerous if adverse impact is not considered from the beginning.
6. GRIT MOTIVATION ENGAGEMENT PERFORMANCE
1 55 80 95%
0 75 10 22%
0 50 20 57%
1 20 90 91%
0 40 60 11%
Basic Tutorial On How To Build A Numeric Feature Model
BUILDING A MODEL
7. ESSAY GRIT MOTIVATION ENGAGEMENT PERFORMANCE
I want to work here 1 55 80 95%
I have great teamwork 0 75 10 22%
Synergy 0 50 20 57%
I have so much grit 1 20 90 91%
They fired that individual 0 40 60 11%
Now what?!?
BUILDING A MODEL
8. ESSAY PERFORMANCE
I want to work here 95%
I have great teamwork 22%
Synergy 57%
I have so much grit 91%
They fired that individual 11%
There are really two different options, mapping or tokenizing
BUILDING A MODEL
Map:
Bad = 0
Good = 1
Better = 2
Best = 3
Tokenize:
Female = 1
Male = 1
Female Male
1 0
0 1
9. I want to work here have great PERF.
1 1 1 1 1 0 0 95%
1 0 0 0 0 1 1 22%
0 0 0 0 0 0 0 57%
1 0 0 0 0 1 0 91%
0 0 0 0 0 0 0 11%
Tokenize the text into unique word columns
BUILDING A MODEL
ESSAY PERFORMANCE
I want to work here 95%
I have great teamwork 22%
Synergy 57%
I have so much grit 91%
They fired that individual 11%
10. I want to work here have great PERF.
1 1 1 1 1 0 0 95%
1 0 0 0 0 1 1 22%
0 0 0 0 0 0 0 57%
1 0 0 0 0 1 0 91%
0 0 0 0 0 0 0 11%
Bag of words modeling, sequence and ordering is lost
BUILDING A MODEL
11. Bag of words modeling, sequence and ordering is lost
BUILDING A MODEL
12. I want Want to to go work here PERF.
1 1 1 1 1 95%
1 0 0 0 0 22%
0 0 0 0 0 57%
1 0 0 0 0 91%
0 0 0 0 0 11%
Band-Aid: Concept of n-grams
BUILDING A MODEL
14. We need a labeled dataset, sometimes getting one with labels is the biggest challenge of all.
SENTIMENT DATASET, 1.5M TWEETS
label text
neg @Christian_Rocha i miss u!!!!!
pos @llanitos there's still some St Werburghs hone...
pos @Ashley96 it's me
neg @Phillykidd we use to be like bestfriends
neg Just got back from Manchester. I went to the T...
pos @LauraDark thnks x el rt
neg "Ughh it's so hot & the singing lady is st...
neg @hnprashanth @dkris I was out to my native for...
pos Girls night with the bests Wish you were here J!
neg Just watched @paulkehler rock the crap out of ...
pos i got the gurl! i got the ride! now im just on...
pos @ninthspace how is the table building going?
pos by d way guyz I must log out na see u again to...
neg @dreday11 its only 20 mins...
Sentiment140
cs.stanford.edu
:(:)
15. Before we can process this we need to do the proper formatting to get it ready
SENTIMENT DATASET - FORMATTING
text
@Christian_Rocha i miss u!!!!!
@llanitos there's still some St Werburghs hone...
@Ashley96 it's me
@Phillykidd we use to be like bestfriends
Just got back from Manchester. I went to the T...
@LauraDark thnks x el rt
"Ughh it's so hot & the singing lady is st...
@hnprashanth @dkris I was out to my native for...
Girls night with the bests Wish you were here J!
Just watched @paulkehler rock the crap out of ...
i got the gurl! i got the ride! now im just on...
@ninthspace how is the table building going?
by d way guyz I must log out na see u again to...
@dreday11 its only 20 mins...
Python list
16. Now we can go all the way to model training and prediction
SENTIMENT DATASET – UNIGRAM
y
[0,1,0,1,1]
text_data
[[‘this is a tweet’]
[‘sounds good’]
[‘not really’]]
I want to work here have great
1 1 1 1 1 0 0
1 0 0 0 0 1 1
0 0 0 0 0 0 0
1 0 0 0 0 1 0
0 0 0 0 0 0 0
17. Now we can go all the way to model training and prediction
SENTIMENT DATASET – BIGRAM
I want Want to to go work here
1 1 1 1 1
1 0 0 0 0
0 0 0 0 0
1 0 0 0 0
0 0 0 0 0
text_data
[[‘this is a tweet’]
[‘sounds good’]
[‘not really’]]
y
[0,1,0,1,1]
20. Convert labels to integers
SENTIMENT DATASET - FORMATTING
model.fit(X,Y)
X
[4,0,0,0,0,7,0,0,1]
[0,0,0,0,9,0,0,0,2]
21. Now we can go all the way to model training and prediction
SENTIMENT DATASET – BUILD A MODEL
y
[0,1,0,1,1]
X
[4,0,0,0,0,7,0,0,1]
[0,0,0,0,9,0,0,0,2]
PERFORMANCE?
39. EMAIL MULTICLASS DATASET (20 classes)
From: lerxst@wam.umd.edu (where's my thing)
Subject: WHAT car is this!?
Nntp-Posting-Host: rac3.wam.umd.edu
Organization: University of Maryland, College Park
Lines: 15
MSG: I was wondering if anyone out there could enlighten me on this car I sawnthe other day. It was a 2-door
sports car, looked to be from the late 60s/nearly 70s. It was called a Bricklin. The doors were really small. In
addition,nthe front bumper was separate from the rest of the body. This is nall I know. If anyone can tellme a
model name, engine specs, yearsnof production, where this car is made, history, or whatever info younhave on
this funky looking car, please e-mail.nnThanks,n- ILn ---- brought to you by your neighborhood Lerxst ----
nnnnn"
rec.autos
40. EMAIL MULTICLASS DATASET (20 classes)
From: guykuo@carson.u.washington.edu (Guy Kuo)
Subject: SI Clock Poll - Final Call
Summary: Final call for SI clock reports
Keywords: SI,acceleration,clock,upgrade
Article-I.D.: shelley.1qvfo9INNc3s
Organization: University of Washington
Lines: 11
NNTP-Posting-Host: carson.u.washington.edu
MSG: A fair number of brave souls who upgraded their SI clock oscillator havenshared their experiences for
this poll. Please send a brief message detailingnyour experiences with the procedure. Top speed attained, CPU
rated speed,nadd on cards and adapters, heat sinks, hour of usage per day, floppy disknfunctionality with 800
and 1.4 m floppies are especially requested.nnI will be summarizing in the next two days, so please add to the
networknknowledge base if you have done the clock upgrade and haven't answered thisnpoll. Thanks.nnGuy
Kuo <guykuo@u.washington.edu>n"
comp.sys.mac.hardware
41. EMAIL MULTICLASS DATASET (20 classes)
From: jgreen@amber (Joe Green)
Subject: Re: Weitek P9000 ?
Organization: Harris Computer Systems Division
Lines: 14
Distribution: world
NNTP-Posting-Host: amber.ssd.csd.harris.com
X-Newsreader: TIN [version 1.1 PL9]
MSG: Robert J.C. Kyanko (rob@rjck.UUCP) wrote:n> abraxis@iastate.edu writes in article
<abraxis.734340159@class1.iastate.edu>:n> > Anyone know about the Weitek P9000 graphics chip?n> As far
as the low-level stuff goes, it looks pretty nice. It's got thisn> quadrilateral fill command that requires just the
four points.nnDo you have Weitek's address/phone number? I'd like to get some informationnabout this
chip.nn--nJoe GreenttttHarris Corporationnjgreen@csd.harris.comtttComputer Systems Divisionn"The
only thing that really scares me is a person with no sense of humor."ntttttt-- Jonathan Wintersn’
comp.graphics
52. Unstructured
ENGINEERS AND MANUAL FEATURES ARE EXPENSIVE, USING DEEP LEARNING TO AUTOMATE
AUTOMATIC FEATURE GENERATION
Structured
I want Want to to go work here PERF.
1 1 1 1 1 95%
1 0 0 0 0 22%
0 0 0 0 0 57%
1 0 0 0 0 91%
0 0 0 0 0 11%
ESSAY
I want to work here
I have great teamwork
Synergy
I have so much grit
They fired that individual
53. ENGINEERS AND MANUAL FEATURES ARE EXPENSIVE, USING DEEP LEARNING TO AUTOMATE
AUTOMATIC FEATURE GENERATION
ESSAY
I want to work here
I have great teamwork
Synergy
I have so much grit
They fired that individual
ESSAY
3 2 1 4 5
3 7 67 345
54
3 7 99 10234
78 203 501 14
1 2 3 4 5
0 0 0 1 0
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
LSTM
RAW TEXT WORD SEQUENCE
ENCODING
My name is Ben Taylor, I’m the Chief Data Scientist for a great startup called HireVue. Today I will be talking about NLP as well as deep learning.
This talk is meant to be an introductory to those who are less familiar with NLP
HR has seen a great cross pollination from other industries, I am an example of that.
I studied chemical engineering through both undergrad and graduate programs.
I then went and worked as a quant for a Manhattan hedge fund manager on a 600 GPU cluster.
And… now I’m in HR.
Oh, and I LOVE love love backcountry snowboarding. I took this photo last week and I go 2-3 times a week before work. I will never work anywhere besides Utah because of this.
What is HireVue?
We are a digital interviewing & interaction company
We are backed by Sequoia Capital
In 2014 we were # 10 for Forebes most promising companies
Global, supporting digital interviews in 189 countries
Building predictive models from competencies or other numeric features is straight forward.
You take the columns or features of interest on the left, and the performance labels on the right and you pass them through a type of regression.
Excel will do this, many programs will do this just fine.
If you are MORE advanced you can use programs like R, python to do more advanced regressions like random forest, gradient boosting regression, or other…
Raise your hand if you know how to build a model from this data?
Now to throw a wrench in your process, I have decided to inject open ended essay response into my feature set.
Raise your hand again if you know how/what to build a predictive model with this?
Most classical statistician/mathematicians/analysis are justifiabilty confused by this is
Like most data science or machine learning tricks, once they are explained at a 5th grade level, we tend to be underwhelmed.
The computer can’t understand the raw text in its native format, it must convert them to numbers. One way to accomplish this is to map the text to numeric replacements.
Good, better, best, can become 1,2, and 3.
What would you do if you had something like male or female? You can map these, because if you made the male 2 and the female 1 are you being sexist?
The are completely different, they can’t be directly compared. Therefore they must be tokenized where each column now represents the variable, so columns are created
In the case of text you can have a LOT of columns. In some cases you may exceed 10,000, 100,000, or even 10M columns.
Imagine attempting to open a dataset like this in excel, with over 1M columns. You have to use special software in R or python that can handle these types of data objects in a compressed sparse format.
Can anyone see what the problems are with this approach? There is a major drawback. [sequence loss[
Bag of words! This is called bag of words because you can visualize the words as if they are picked up by a paper bag. All sequence and ordering is lost.
Is that a problem? Maybe.
I analyzed some twitter data for Skullcandy a few years ago. When we presented our results to the engineering team we asked them “If someone says the F word in a tweet and tags your company… is that a bad thing?”.
Think about it, for most of us in the room, with the companies we work for and represent does that give you anxiety thinking about that? The reason that gives us anxiety is because we know that would be a terrible thing and it would be really bad.
Skullcandy knew their customer base well enough, they said they were sure. And sure enough the data showed that half of the people saying the F word on twitter and tagging Skullcandy said nice things, and the other have said mean things. So a word that is typically polarizing had no impact.
Bad.... Bad is a bad word
Ass.... Ass is a bad word
But... If I say “bad ass” my bag of words method is going to see that as a very very bad thing, when in fact is is a very nice thing. How do we fix that?
Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs.
We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns.
Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs.
We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns.
Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs.
We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns.
Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs.
We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns.
Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs.
We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns.
Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs.
We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns.
Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs.
We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns.
Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs.
We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns.
Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs.
We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns.
Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs.
We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns.
Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs.
We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns.
Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs.
We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns.
Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs.
We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns.
Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs.
We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns.
Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
One point to bring up is proper model validation. I used to be confused when someone said train on 70, test on 30.
Or train on 80 test on 20. Who was right?
The answer I have settled on now is their of them are right.
Explain the conflict.
Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs.
We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns.
Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
Introduction to n-gram tuples. Not only can we create unique placeholders for words, but we can also do it for word pairs.
We can do it for single word, two word, three word, as many pairs as we want. But as we do that the columns count explodes exponentially and… the reoccurance of that observation goes down... Both of these are bad so you have rapidly diminensing returns.
Also, if you throw a single adjective or word inbetween your expected bi-gram it won’t be found.
Now that we have some basic NLP background we will change gears to RESUME modeling. Who hates looking through stacks of resumes?
Not sure, sometimes it can be fun, but depending on the stack size you might be spending 30 seconds on a resume, 7 seconds? How quickly can you screen a resume?
Think about what you are doing when you screen a resume? Where you are your eyes looking?
School
GPA
Skills
Work history
Name? Hopefully you don’t look at name.
To review possible flow using NLP first we have an unstructured resume, we are forced to structure it somehow, then tokenize or munge the data into numeric
Sometimes we can predict things without opening up the resume.
Checkout these file extensions. It is hard to see, but statistically someone who uploads a DOC resume is be more likely to interview well than someone who does RTF.
Likewise DOCX beats DOC
And PDF beats DOCX
What do we do with ALL of these formats? DOCX, txt, pdf?
This is actually a big problem, we can’t do anything cool until we standardize the formats.
Luckily there is a free open source office platform that can do the conversion for us. I recommend converting it to either txt or html.
Now that we have text we can write specific feature grabbers like GPA. For the resumes we analyzed we noticed that GPAs were only included 1/5 resumes.
Also this is where the distributions fell, not very many below 3.00 GPA are reporting.
What do you do if someone does not include a GPA? When a feature is missing you MUST replace.
Do you replace the GPA with a 0? That’s harsh, a 2.0? 4.0? Average? It depends
Testing prediction quality we found that optimal prediction quality comes when we replace the GPA with a 3.6
What does that mean?
That means if you have less than a 3.6 GPA, as far as the computer is concerned, including it doesn’t help you.
There are so many features to create in the case of a resume model, you can save yourself a lot of time using a resume parsing service.
The majority of the value comes from BOW.
You quickly begin approaching incremental returns where a LOT of effort results in marginal gain.
Malicious resume
The biggest value that deep learning offers is automatic feature value discovery. This has been incredibly valuable with image, hitting new high points.
It can also be valuable for text, allowing you to forget the concept of a tuple or n-gram.
In the end the computer always needs a number, but in this case it is looking at very large sequences of numbers (100-300 word windows).
Run it on entire resume:
What is the prediction?
In the end the computer always needs a number, but in this case it is looking at very large sequences of numbers (100-300 word windows).
Run it on entire resume:
What is the prediction?
In the end the computer always needs a number, but in this case it is looking at very large sequences of numbers (100-300 word windows).
Run it on entire resume:
What is the prediction?
In the end the computer always needs a number, but in this case it is looking at very large sequences of numbers (100-300 word windows).
Run it on entire resume:
What is the prediction?
Fun tangent, does resume formatting matter? Margins? Font size, layout?
Would you ever hire from just a resume? Why not?
For interview modeling we use spoken text, which is more difficult because of the transcription accuracies.
Raw audio (utterence, repetition)
Video, micro expression (Lie to me)