DataEngConf: Building the Next New York Times Recommendation Engine

Building the Next New York
Times Recommendation
Engine
By Alexander Spangher

Problem Statement:
1. The New York Times publishes over 300 articles, blog posts and interactive
stories a day.
Corpus:
n articles that are still relevant
over the past x days

For each user:
1 2 3 4 ...
30 day reading history

Machine Learning
“All of machine learning can be broken down into regression and matrix
factorization.”
-A drunk PhD student at a bar
1. Regression: f(input) = output
2. Factorization: f(output) = output
-Yann Lecun, 2015

Problem Statement (Refined)
1. Define pool of articles.
Not all articles expire at the same rate
1. Rank order articles based on reading history of user.
Assume that reader’s future preferences will match past preferences

Evergreen Model
Section,
Desk,
Word Count
...
clicks per day
2. Learn
relationship
between
features
and metric
1. Learn training metric
3. Convert to
interpretable
expiration date

Fit a to each item in training set
Fit:
i

Likelihood function:
Maximum Likelihood Estimate (MLE)
likelihood of data and
parameters
joint pdf of data given
parameter
product of independent pdf’s

Maximum Likelihood Estimate
Given timestamp of every click:

Maximum Likelihood Estimate
???

Maximum Log Likelihood Estimate

Or, use optimization package:
Python: http://cvxopt.org/
Convex Optimization by Stephen Boyd

Learn relationship between article features and
x = [desk, word count, section, ...]
y =
General Linear model:

Building the Recommender
(http://open.blogs.nytimes.com/2015/08/11/building-the-next-new-york-times-recommendation-engine/)

First Iteration
Keyword-Based model: TF-IDF Vector
N = number of times word appears in document
D = number of documents that word appears in

First Iteration
Keyword-Based model: TF-IDF Vector
[ 0.02, 0.5, 0, 0, … , .01 ]
[ 0.9, 0.01, 0.2, … , .05 ]
fun cat dog scholar
nice

Feedback:
“Recommendations work for me
I have been following the Oscar Pistorius case for over a year now and every time there has been a
relevant story about the case, I have been recommended that story.
Recommendations seem to be working very well for me.”

Feedback:
“No More Brooks recommendations, please
Your constant pushing of David Brooks onto me is like an annoying grandmother who won't believe
you are really allergic to peanuts even though you regularly go into anaphylactic shock at her dinner
table and need to be rushed to the hospital. What can I say… you're killing me. Please stop it.
...
Thanks for your attention to this matter.”

Feedback:
“Dear NY Times,
You seem to have missed the fact that, while I do read the Weddings section, I only (or almost only)
read about the weddings of same sex couples.
Please stop recommending heterosexual weddings articles to me!!”

[ 0.02, 0.5, 0, 0, … , .01 ]
[ 0.9, 0.01, 0.2, … , .05 ]
1 2 3 4
k
LDA-Based model: Topic Vector
Second Iteration:

Example topic,
probabilityweight
cat yarn tree building car money bank paw toy newspaper Spotify

Example topic, :
probabilityweight
cat yarn tree building car money bank paw toy newspaper Spotify

How do we learn these parameters?
LDA Definition:
Choose 𝜃 ~ Dirichlet(ɑ)
For each in document:
Choose word topic ~ Mult(𝜃)
Choose word from

Variational Inference
Image borrowed from David Blei (2003)

Variational Inference (cont.)
1. (E-Step):
1. (M-Step):
tractable!!!

Collaborative Topic Modeling (CTM)
Image borrowed from David Blei (2011)
The graphical model for the CTM model we use.

Scaling the algorithm
Training procedure is batch. Do we have time to scale to all our users, in real
time???

Strategy:
1. Iterate until some variables don’t change (article-topics).
1. Scale out, fixing non-changing variables. Update equation for one variable
becomes a closed-form equation.

Algorithm
1. Batch train on training set of users
1. Fix and scale out to all users

Derive scores for users
As seen in:
http://benanne.github.io/2014/08/05/spotify-cnns.html!!

C parameter: the back-off average

Any vector-based algorithm.
1)Deep Network (Spotify’s audio-CNN)
2)Shallow Network (Doc2Vec)
3)Topic Model
4)pLSA

In conclusion
Modeling is fun!
All models are bad, but some can be useful!
Improve by recognizing shortfalls.
Evaluate on KPIs, on customer feedback, on design decisions.

not functional
sub-optimal
flat-lining/degrading

DataEngConf: Building the Next New York Times Recommendation Engine

Recommandé

Recommandé

Contenu connexe

Similaire à DataEngConf: Building the Next New York Times Recommendation Engine

Similaire à DataEngConf: Building the Next New York Times Recommendation Engine (20)

Plus de Hakka Labs

Plus de Hakka Labs (20)

Dernier

Dernier (20)

DataEngConf: Building the Next New York Times Recommendation Engine