By Alex Spangher (Data Engineer, New York Times Digital)
Machine Learning is a discipline characterized by systematic approaches and common threads to seemingly diverse problems. In this talk I'll talk about several approaches taken during our work on the next New York Times Recommendation Engine, specifically focusing on spatial reasoning, dimensionality reduction, and testing strategies. Topics covered will include implicit regression, Bayesian modeling and neural networks. The talk will focus on the commonalities between different approaches taken.
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
DataEngConf: Building the Next New York Times Recommendation Engine
1. Building the Next New York
Times Recommendation
Engine
By Alexander Spangher
2. Problem Statement:
1. The New York Times publishes over 300 articles, blog posts and interactive
stories a day.
Corpus:
n articles that are still relevant
over the past x days
5. Machine Learning
“All of machine learning can be broken down into regression and matrix
factorization.”
-A drunk PhD student at a bar
1. Regression: f(input) = output
2. Factorization: f(output) = output
-Yann Lecun, 2015
6. Problem Statement (Refined)
1. Define pool of articles.
Not all articles expire at the same rate
1. Rank order articles based on reading history of user.
Assume that reader’s future preferences will match past preferences
23. Feedback:
“Recommendations work for me
I have been following the Oscar Pistorius case for over a year now and every time there has been a
relevant story about the case, I have been recommended that story.
Recommendations seem to be working very well for me.”
24. Feedback:
“No More Brooks recommendations, please
Your constant pushing of David Brooks onto me is like an annoying grandmother who won't believe
you are really allergic to peanuts even though you regularly go into anaphylactic shock at her dinner
table and need to be rushed to the hospital. What can I say… you're killing me. Please stop it.
...
Thanks for your attention to this matter.”
25. Feedback:
“Dear NY Times,
You seem to have missed the fact that, while I do read the Weddings section, I only (or almost only)
read about the weddings of same sex couples.
Please stop recommending heterosexual weddings articles to me!!”
38. Strategy:
1. Iterate until some variables don’t change (article-topics).
1. Scale out, fixing non-changing variables. Update equation for one variable
becomes a closed-form equation.
43. In conclusion
Modeling is fun!
All models are bad, but some can be useful!
Improve by recognizing shortfalls.
Evaluate on KPIs, on customer feedback, on design decisions.