꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
RecSys Challenge 2016: job recommendations based on preselection of offers and gradient boosting
1. Challenge statement
Our Solution
What could we do better?
RecSys Challenge 2016
job recommendations based on preselection of offers and gradient
boosting
Andrzej Pacuk Piotr Sankowski Karol W˛egrzycki
Adam Witkowski Piotr Wygocki
apacuk@mimuw.edu.pl
University of Warsaw
RecSys Challenge 2016
mim-solutions.pl RecSys Challenge 2016
2. Challenge statement
Our Solution
What could we do better?
Outline
1 Challenge statement
2 Our Solution
Candidate items selection
Learning probabilities
Features
3 What could we do better?
mim-solutions.pl RecSys Challenge 2016
3. Challenge statement
Our Solution
What could we do better?
Problem
Xing.com dataset:
user profiles (experience, education, current job’s roles, etc.),
job (item) offer description (title, tags, employment type, etc.),
past recommendations (impressions),
user positive (clicking, bookmarking, replying) and negative
(deleting) interactions with items.
Task: predict user’s positive interactions.
mim-solutions.pl RecSys Challenge 2016
4. Challenge statement
Our Solution
What could we do better?
Evaluation
Secret ground truth (GT): positive interactions from test week.
Mean average precision-like (MAP) measure.
Online evaluation.
Finished 2nd!
mim-solutions.pl RecSys Challenge 2016
5. Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Solution’s schema
user
job #1 job #2 job #3
select candidates
predict probabilities
sort
... job #N
job #1
0.3
job #2
0.7
job #3
0.4
...
job #N
0.5
job #15
0.9
job #34
0.89
...
job #124
0.75
take top 30
mim-solutions.pl RecSys Challenge 2016
6. Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Training set
Training GT: positive interactions of last week.
Local score.
Separate candidates and features for training and full dataset!
mim-solutions.pl RecSys Challenge 2016
7. Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Candidates
Candidate - item with high:
P [i ∈ GT(u)] .
20 categories.
Ranking: e.g. sort interactions by timestamp.
∼ 300 candidates per user (0.1% of all items).
37% cover of training GT.
mim-solutions.pl RecSys Challenge 2016
8. Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Candidates categories
Users’s interactions (Int(u)) sorted by week and events count
within week,
Similarly for impressions (Imp(u)),
Int(u ) for users u sorted by:
Jaccard(Int(u), Int(u )).
mim-solutions.pl RecSys Challenge 2016
9. Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Candidates (cold start)
items i sorted by:
max
i ∈Int(u)
|tags(i) ∩ tags(i )|,
items i sorted by:
|jobroles(u) ∩ tags(i)|,
globally most popular items.
mim-solutions.pl RecSys Challenge 2016
10. Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Candidate ranking
XGBoost (Gradient Boosting Decision Trees).
Optimizing logloss.
Training file from preselected candidates:
all positive,
sampled negative.
77.5% of perfect candidates ranking’s score.
mim-solutions.pl RecSys Challenge 2016
11. Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Features
Feature maps (user, item) to real number.
12 groups.
Total 273.
Worked well with:
highly correlated features,
null values,
no scaling/normalization.
mim-solutions.pl RecSys Challenge 2016
12. Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Feature definitions (sample)
Event based item: percentage of Int(u) having same property
(e.g., employment) as item i.
Most similar user who clicked item:
max
u ∈Users(i)
Jaccard(Int(u), Int(u )).
Most similar item clicked by user:
max
i ∈Int(u)
Jaccard(Users(i), Users(i )).
mim-solutions.pl RecSys Challenge 2016
13. Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Top feature groups
feature group fscore
event based user (item) profile 41%
tags + title 7%
item global popularity 22%
trend 10%
weekday 4%
most similar 10%
item clicked by user 6%
user who clicked item 4%
user total events 8%
in last week 4%
seconds from last user activity 7%
max common tags with clicked item 4%
mim-solutions.pl RecSys Challenge 2016
14. Challenge statement
Our Solution
What could we do better?
Possible improvements
Training file:
8x bigger,
sample 1/4 negative candidates (instead of random 5) per user.
score: +6.5k.
Ensembling models.
Layer scores:
Candidates selection: 37%.
Ranking candidates: 77.5%.
mim-solutions.pl RecSys Challenge 2016