4. Netflix scale
● > 69M members
● > 50 countries
● > 1000 device types
● > 3B hours/month
● 36% of peak US downstream traffic
5. Recommendations @ Netflix
● Goal: Help members find content
to watch and enjoy to maximize
satisfaction and retention
● Over 80% of what people watch
comes from our recommendations
● Top Picks, Because you Watched,
Trending Now, Row Ordering,
Evidence, Search, Search
Genre Rows, ...
9. When tackling a new problem
● What offline metrics can we compute that capture what online improvements we’
re actually trying to achieve?
● How should the input data to that evaluation be constructed (train, validation,
● How fast and easy is it to run a full cycle of offline experimentations?
○ Minimize time to first metric
● How replicable is the evaluation? How shareable are the results?
○ Provenance (see Dagobah)
○ Notebooks (see Jupyter, Zeppelin, Spark Notebook)
10. When tackling an old problem
○ Were the metrics designed when first running experimentation in that space still appropriate now?
12. 1. For each combination of hyper-parameter
(e.g. grid search, random search, gaussian processes…)
2. For each subset of the training data
a. Multi-core learning (e.g. HogWild)
b. Distributed learning (e.g. ADMM, distributed L-BFGS, …)
13. When to use distributed learning?
● The impact of communication overhead when building distributed ML
algorithms is non-trivial
● Is your data big enough that the distribution offsets the communication overhead?
16. Idea Data
(A/B test) Code
Example development process