RecSys Challenge 2016: job recommendations based on preselection of offers and gradient boosting

•

3 j'aime•1,471 vues

RecSys Challenge 2016 solution, scoring 2nd place, https://recsys.xing.com/leaders about authors: http://mim-solutions.pl/

Données & analyses

Challenge statement
Our Solution
What could we do better?
RecSys Challenge 2016
job recommendations based on preselection of offers and gradient
boosting
Andrzej Pacuk Piotr Sankowski Karol W˛egrzycki
Adam Witkowski Piotr Wygocki
apacuk@mimuw.edu.pl
University of Warsaw
RecSys Challenge 2016
mim-solutions.pl RecSys Challenge 2016

Challenge statement
Our Solution
What could we do better?
Outline
1 Challenge statement
2 Our Solution
Candidate items selection
Learning probabilities
Features
3 What could we do better?
mim-solutions.pl RecSys Challenge 2016

Challenge statement
Our Solution
What could we do better?
Problem
Xing.com dataset:
user proﬁles (experience, education, current job’s roles, etc.),
job (item) offer description (title, tags, employment type, etc.),
past recommendations (impressions),
user positive (clicking, bookmarking, replying) and negative
(deleting) interactions with items.
Task: predict user’s positive interactions.
mim-solutions.pl RecSys Challenge 2016

Challenge statement
Our Solution
What could we do better?
Evaluation
Secret ground truth (GT): positive interactions from test week.
Mean average precision-like (MAP) measure.
Online evaluation.
Finished 2nd!
mim-solutions.pl RecSys Challenge 2016

Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Solution’s schema
user
job #1 job #2 job #3
select candidates
predict probabilities
sort
... job #N
job #1
0.3
job #2
0.7
job #3
0.4
...
job #N
0.5
job #15
0.9
job #34
0.89
...
job #124
0.75
take top 30
mim-solutions.pl RecSys Challenge 2016

Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Training set
Training GT: positive interactions of last week.
Local score.
Separate candidates and features for training and full dataset!
mim-solutions.pl RecSys Challenge 2016

Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Candidates
Candidate - item with high:
P [i ∈ GT(u)] .
20 categories.
Ranking: e.g. sort interactions by timestamp.
∼ 300 candidates per user (0.1% of all items).
37% cover of training GT.
mim-solutions.pl RecSys Challenge 2016

Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Candidates categories
Users’s interactions (Int(u)) sorted by week and events count
within week,
Similarly for impressions (Imp(u)),
Int(u ) for users u sorted by:
Jaccard(Int(u), Int(u )).
mim-solutions.pl RecSys Challenge 2016

Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Candidates (cold start)
items i sorted by:
max
i ∈Int(u)
|tags(i) ∩ tags(i )|,
items i sorted by:
|jobroles(u) ∩ tags(i)|,
globally most popular items.
mim-solutions.pl RecSys Challenge 2016

Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Candidate ranking
XGBoost (Gradient Boosting Decision Trees).
Optimizing logloss.
Training ﬁle from preselected candidates:
all positive,
sampled negative.
77.5% of perfect candidates ranking’s score.
mim-solutions.pl RecSys Challenge 2016

Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Features
Feature maps (user, item) to real number.
12 groups.
Total 273.
Worked well with:
highly correlated features,
null values,
no scaling/normalization.
mim-solutions.pl RecSys Challenge 2016

Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Feature deﬁnitions (sample)
Event based item: percentage of Int(u) having same property
(e.g., employment) as item i.
Most similar user who clicked item:
max
u ∈Users(i)
Jaccard(Int(u), Int(u )).
Most similar item clicked by user:
max
i ∈Int(u)
Jaccard(Users(i), Users(i )).
mim-solutions.pl RecSys Challenge 2016

Challenge statement
Our Solution
What could we do better?
Candidate items selection
Learning probabilities
Features
Top feature groups
feature group fscore
event based user (item) proﬁle 41%
tags + title 7%
item global popularity 22%
trend 10%
weekday 4%
most similar 10%
item clicked by user 6%
user who clicked item 4%
user total events 8%
in last week 4%
seconds from last user activity 7%
max common tags with clicked item 4%
mim-solutions.pl RecSys Challenge 2016

Challenge statement
Our Solution
What could we do better?
Possible improvements
Training ﬁle:
8x bigger,
sample 1/4 negative candidates (instead of random 5) per user.
score: +6.5k.
Ensembling models.
Layer scores:
Candidates selection: 37%.
Ranking candidates: 77.5%.
mim-solutions.pl RecSys Challenge 2016

Challenge statement
Our Solution
What could we do better?
Thank you
apacuk@mimuw.edu.pl
mim-solutions.pl
mim-solutions.pl RecSys Challenge 2016

Recommandé

A Combination of Simple Models by Forward Predictor Selection for Job Recomme...David Zibriczky

RecSys Multi-Stack Ensemble for Job Recommendation, Pumpkin-PieTommaso Carpi

Avito recsys-challenge-2016RecSys Challenge 2016: Job Recommendation Based on...Vasily Leksin

Temporal Learning and Sequence Modeling for a Job Recommender SystemAnoop Kumar

Jobandtalent at recsys challenge 2016Oscar Huarte

A Scalable, High-performance Algorithm for Hybrid Job RecommendationsToon De Pessemier

RecSys Challenge 2016Fabian Abel

Recsys 2016Mindaugas Zickus

Recommandé

A Combination of Simple Models by Forward Predictor Selection for Job Recomme...David Zibriczky

RecSys Multi-Stack Ensemble for Job Recommendation, Pumpkin-PieTommaso Carpi

Avito recsys-challenge-2016RecSys Challenge 2016: Job Recommendation Based on...Vasily Leksin

Temporal Learning and Sequence Modeling for a Job Recommender SystemAnoop Kumar

Jobandtalent at recsys challenge 2016Oscar Huarte

A Scalable, High-performance Algorithm for Hybrid Job RecommendationsToon De Pessemier

RecSys Challenge 2016Fabian Abel

Recsys 2016Mindaugas Zickus

Recruit recsys-review-magamboElie Magambo Gatete

Thesis_Nazarova_Final(1)Sardana Nazarova

allegrotech - Data science meetup #1 IntroBartlomiej Twardowski

Warsaw Data Science - Factorization Machines IntroductionBartlomiej Twardowski

Systemy rekomendacji, Algorytmy rankingu Top-N rekomendacji bazujące na nieja...Bartlomiej Twardowski

Rekomendujemy - Szybkie wprowadzenie do systemów rekomendacji oraz trochę wie...Bartlomiej Twardowski

Warsaw Data Science - Recsys2016 Quick ReviewBartlomiej Twardowski

Prezentacja z Big Data Tech 2016: Machine Learning vs Big DataBartlomiej Twardowski

Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Bartlomiej Twardowski

Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain

Recommender SystemsT212

Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth

BabyOno dropshipping via API with DroFx.pptxolyaivanovalion

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

Smarteg dropshipping via API with DroFx.pptxolyaivanovalion

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

Midocean dropshipping via API with DroFxolyaivanovalion

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一ffjhghh

Week-01-2.ppt BBB human Computer interactionfulawalesam

Mature dropshipping via API with DroFx.pptxolyaivanovalion

Contenu connexe

En vedette

Recruit recsys-review-magamboElie Magambo Gatete

Thesis_Nazarova_Final(1)Sardana Nazarova

allegrotech - Data science meetup #1 IntroBartlomiej Twardowski

Warsaw Data Science - Factorization Machines IntroductionBartlomiej Twardowski

Systemy rekomendacji, Algorytmy rankingu Top-N rekomendacji bazujące na nieja...Bartlomiej Twardowski

Rekomendujemy - Szybkie wprowadzenie do systemów rekomendacji oraz trochę wie...Bartlomiej Twardowski

Warsaw Data Science - Recsys2016 Quick ReviewBartlomiej Twardowski

Prezentacja z Big Data Tech 2016: Machine Learning vs Big DataBartlomiej Twardowski

Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Bartlomiej Twardowski

Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain

Recommender SystemsT212

Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain

En vedette (12)

Recruit recsys-review-magambo

Thesis_Nazarova_Final(1)

allegrotech - Data science meetup #1 Intro

Warsaw Data Science - Factorization Machines Introduction

Systemy rekomendacji, Algorytmy rankingu Top-N rekomendacji bazujące na nieja...

Rekomendujemy - Szybkie wprowadzenie do systemów rekomendacji oraz trochę wie...

Warsaw Data Science - Recsys2016 Quick Review

Prezentacja z Big Data Tech 2016: Machine Learning vs Big Data

Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...

Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...

Recommender Systems

Recommender Systems (Machine Learning Summer School 2014 @ CMU)

Dernier

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth

BabyOno dropshipping via API with DroFx.pptxolyaivanovalion

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

Smarteg dropshipping via API with DroFx.pptxolyaivanovalion

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

Midocean dropshipping via API with DroFxolyaivanovalion

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一ffjhghh

Week-01-2.ppt BBB human Computer interactionfulawalesam

Mature dropshipping via API with DroFx.pptxolyaivanovalion

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

Brighton SEO | April 2024 | Data StorytellingNeil Barnes

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor

Edukaciniai dropshipping via API with DroFxolyaivanovalion

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Data-Analysis for Chicago Crime Data 2023ymrp368

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Dernier (20)

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Unveiling Insights: The Role of a Data Analyst

BabyOno dropshipping via API with DroFx.pptx

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call

Smarteg dropshipping via API with DroFx.pptx

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

Midocean dropshipping via API with DroFx

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一

Week-01-2.ppt BBB human Computer interaction

Mature dropshipping via API with DroFx.pptx

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure

Brighton SEO | April 2024 | Data Storytelling

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati

Edukaciniai dropshipping via API with DroFx

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...

Data-Analysis for Chicago Crime Data 2023

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Ravak dropshipping via API with DroFx.pptx

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...

RecSys Challenge 2016: job recommendations based on preselection of offers and gradient boosting

1. Challenge statement Our Solution What could we do better? RecSys Challenge 2016 job recommendations based on preselection of offers and gradient boosting Andrzej Pacuk Piotr Sankowski Karol W˛egrzycki Adam Witkowski Piotr Wygocki apacuk@mimuw.edu.pl University of Warsaw RecSys Challenge 2016 mim-solutions.pl RecSys Challenge 2016

2. Challenge statement Our Solution What could we do better? Outline 1 Challenge statement 2 Our Solution Candidate items selection Learning probabilities Features 3 What could we do better? mim-solutions.pl RecSys Challenge 2016

3. Challenge statement Our Solution What could we do better? Problem Xing.com dataset: user proﬁles (experience, education, current job’s roles, etc.), job (item) offer description (title, tags, employment type, etc.), past recommendations (impressions), user positive (clicking, bookmarking, replying) and negative (deleting) interactions with items. Task: predict user’s positive interactions. mim-solutions.pl RecSys Challenge 2016

4. Challenge statement Our Solution What could we do better? Evaluation Secret ground truth (GT): positive interactions from test week. Mean average precision-like (MAP) measure. Online evaluation. Finished 2nd! mim-solutions.pl RecSys Challenge 2016

5. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Solution’s schema user job #1 job #2 job #3 select candidates predict probabilities sort ... job #N job #1 0.3 job #2 0.7 job #3 0.4 ... job #N 0.5 job #15 0.9 job #34 0.89 ... job #124 0.75 take top 30 mim-solutions.pl RecSys Challenge 2016

6. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Training set Training GT: positive interactions of last week. Local score. Separate candidates and features for training and full dataset! mim-solutions.pl RecSys Challenge 2016

7. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Candidates Candidate - item with high: P [i ∈ GT(u)] . 20 categories. Ranking: e.g. sort interactions by timestamp. ∼ 300 candidates per user (0.1% of all items). 37% cover of training GT. mim-solutions.pl RecSys Challenge 2016

8. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Candidates categories Users’s interactions (Int(u)) sorted by week and events count within week, Similarly for impressions (Imp(u)), Int(u ) for users u sorted by: Jaccard(Int(u), Int(u )). mim-solutions.pl RecSys Challenge 2016

9. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Candidates (cold start) items i sorted by: max i ∈Int(u) |tags(i) ∩ tags(i )|, items i sorted by: |jobroles(u) ∩ tags(i)|, globally most popular items. mim-solutions.pl RecSys Challenge 2016

10. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Candidate ranking XGBoost (Gradient Boosting Decision Trees). Optimizing logloss. Training ﬁle from preselected candidates: all positive, sampled negative. 77.5% of perfect candidates ranking’s score. mim-solutions.pl RecSys Challenge 2016

11. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Features Feature maps (user, item) to real number. 12 groups. Total 273. Worked well with: highly correlated features, null values, no scaling/normalization. mim-solutions.pl RecSys Challenge 2016

12. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Feature deﬁnitions (sample) Event based item: percentage of Int(u) having same property (e.g., employment) as item i. Most similar user who clicked item: max u ∈Users(i) Jaccard(Int(u), Int(u )). Most similar item clicked by user: max i ∈Int(u) Jaccard(Users(i), Users(i )). mim-solutions.pl RecSys Challenge 2016

13. Challenge statement Our Solution What could we do better? Candidate items selection Learning probabilities Features Top feature groups feature group fscore event based user (item) proﬁle 41% tags + title 7% item global popularity 22% trend 10% weekday 4% most similar 10% item clicked by user 6% user who clicked item 4% user total events 8% in last week 4% seconds from last user activity 7% max common tags with clicked item 4% mim-solutions.pl RecSys Challenge 2016

14. Challenge statement Our Solution What could we do better? Possible improvements Training ﬁle: 8x bigger, sample 1/4 negative candidates (instead of random 5) per user. score: +6.5k. Ensembling models. Layer scores: Candidates selection: 37%. Ranking candidates: 77.5%. mim-solutions.pl RecSys Challenge 2016

15. Challenge statement Our Solution What could we do better? Thank you apacuk@mimuw.edu.pl mim-solutions.pl mim-solutions.pl RecSys Challenge 2016