1. Large-Scale Recommendation Systems Workshop
RecSys 2013, Hong Kong
Recommendation at Netflix Scale
Justin Basilico
Netflix Algorithm Engineering
October 13, 2013
1
11. Genre Personalization
Personalized genre rows
focus on user interest
Also provide context and
“evidence”
How are they generated?
Implicit: based on user’s recent
plays, ratings, & other
interactions
Explicit taste preferences
Hybrid: combine the above
Also take into account:
Freshness - has this been
shown before?
Diversity– avoid repeating tags
and genres, limit number of TV
genres, etc.
11
12. Similars
Displayed in many
contexts
Video display page
In response to user
actions
(search, queue
add, …)
“Because you
watched” rows
12
17. Netflix Data
> 37M members
> 40 countries
> 1000 device types
Ratings: > 4M/day
Searches: > 3M/day
Plays: > 30M/day
1B hours in June 2012
> 4B hours in Q1 2013
Log 100B events/day
32.25% of peak US downstream
traffic
17
18. Plays
●
What people watch
●
The most important source of data for
our algorithms
●
A few plays are usually more valuable
than most of our other data
●
We have a lot of information
associated to a play:
○
Duration
○
Start/stop/pause/rewind
○
Device, location, time, …
○
Page context
○
…
18
19. Ratings
Explicit information about a member’s taste
should be great
But we find ratings are…
Noisy
Sparse
Biased
Quality of our ratings has decreased over
time
19
20. Metadata
●
Our tag space is made of thousands of
different concepts
●
Manually annotated by a set of experts
●
Although an automatic approach may be
possible, we believe it would be of lesser
quality
○
●
However, we are researching on automatic
annotation of scenes, transitions…
Metadata is useful
○
Especially for coldstart
20
21. Social
●
Can your “friends” interests help us predict
yours better?
●
The answer is similar to the Metadata case:
○
○
●
If we know enough about you, social information
becomes less useful
But, it is very interesting for coldstarting
Social support for recommendations has been
shown to matter
21
22. Affordances
Highly curated catalog
Catalog changes daily
Videos have long shelf-lives
Videos take time to consume
22
28. Cloud Computing at Netflix
Layered services
Clusters: Horizontal scaling
Auto-scale with demand
Plan for failure
Replication
Fail fast
State is bad
Simian Army: Induce failures to
ensure resiliency
28
29. System Overview
OFFLINE
Netflix.Hermes
Query results
Blueprint for multiple
personalization algorithm
services
Ranking
Row selection
Offline Data
Machine
Learning
Algorithm
Offline
Computation
Nearline
Computation
NEARLINE
Models
Machine
Learning
Algorithm
Netflix.Manhattan
Ratings
User Event
Queue
…
Recommendation involving
multi-layered Machine
Learning
Model
training
Event Distribution
Algorithm
Service
Online
Data Service
UI Client
ONLINE
Play, Rate,
Browse...
Recommendations
Online
Computation
Machine
Learning
Algorithm
Member
29
30. OFFLINE
Netflix.Hermes
Query results
Offline Data
Event & Data Distribution
Machine
Learning
Algorithm
Netflix.Manhattan
Collect actions
Machine
Learning
Algorithm
User Event
Queue
Algorithm
Service
Online
Data Service
UI Client
User Event
Queue
Play, Rate,
Browse...
Recommendations
Online
Computation
Machine
Learning
Algorithm
Member
Event Distribution
Small units
Data
Models
Netflix.Manhattan
Event Distribution
Plays, browsing, searches, ratin
gs, etc.
Time sensitive
Nearline
Computation
NEARLINE
ONLINE
Events
Model
training
Offline
Computation
UI Client
Play, Rate,
Browse...
Dense information
Processed for further use
Saved
Member
30
31. Computation Layers
OFFLINE
Netflix.Hermes
Offline
Offline Data
Models
Offline
Computation
Process data
Nearline
Nearline
Computation
NEARLINE
Machine
Learning
Algorithm
Netflix.Manhattan
Process events
Online
Process requests
ONLINE
Algorithm
Service
Online Data
Service
UI Client
Recommendations
Online
Computation
Machine
Learning
Algorithm
Member
31
32. OFFLINE
Netflix.Hermes
Query results
Offline Data
Online Computation
Machine
Learning
Algorithm
Model
training
Offline
Computation
Nearline
Computation
NEARLINE
Models
Machine
Learning
Algorithm
Netflix.Manhattan
Synchronous computation in
response to a member request
Pros:
Good for:
User Event
Queue
Event Distribution
Algorithm
Service
Online
Data Service
UI Client
Simple algorithms
ONLINE
Play, Rate,
Browse...
Recommendations
Online
Computation
Machine
Learning
Algorithm
Member
Model application
Access to most fresh data
Business logic
Knowledge of full request context
Context-dependence
Compute only what is necessary
Interactivity
Online
Data Service
Cons:
Strict Service Level Agreements
Must respond quickly … in all cases
Requires high availability
Limited view of data
Event Distribution
Algorithm
Service
UI Client
Play, Rate,
Browse...
Recommendations
Online
Computation
Machine
Learning
Algorithm
www.netflix.com
Member
32
33. OFFLINE
Netflix.Hermes
Query results
Offline Data
Offline Computation
Machine
Learning
Algorithm
Model
training
Offline
Computation
Nearline
Computation
NEARLINE
Models
Machine
Learning
Algorithm
Netflix.Manhattan
Asynchronous computation done
on a regular schedule
Good for:
User Event
Queue
Event Distribution
Algorithm
Service
Online
Data Service
UI Client
ONLINE
Batch learning
Pros:
Play, Rate,
Browse...
Online
Computation
Recommendations
Machine
Learning
Algorithm
Member
Model training
Can handle large data
Complex algorithms
Can do bulk processing
Precomputing
Relaxed time constraints
Cons:
Query results
Netflix.Hermes
Model
training
Machine
Learning
Algorithm
Cannot react quickly
Results can become stale
Models
Offline Data
Offline
Computation
Machine
Learning
Algorithm
33
34. OFFLINE
Netflix.Hermes
Query results
Offline Data
Nearline Computation
Machine
Learning
Algorithm
Model
training
Offline
Computation
Nearline
Computation
NEARLINE
Models
Machine
Learning
Algorithm
Netflix.Manhattan
Asynchronous computation in
response to a member event
Pros:
Good for:
User Event
Queue
Event Distribution
Algorithm
Service
Online
Data Service
UI Client
Incremental learning
ONLINE
Play, Rate,
Browse...
Recommendations
Online
Computation
Machine
Learning
Algorithm
Member
User-oriented algorithms
Can keep data fresh
Moderate complexity algorithms
Can run moderate complexity
algorithms
Keeping precomputed results
fresh
Can average computational cost
across users
Nearline
Computation
Change from actions
Cons:
Machine
Learning
Algorithm
Netflix.Manhattan
Has some delay
Done in event context
User Event
Queue
34
35. Where to place components?
Example: Matrix Factorization
Offline:
Collect sample of play data
Run batch learning algorithm to
produce factorization
Publish item factors
Nearline:
Solve user factors
Compute user-item products
Combine
Online:
Presentation-context filtering
Serve recommendations
OFFLINE
X
Netflix.Hermes
Query results
Offline Data
Machine
Learning
Algorithm
Model
X≈UVt
training
Offline
Computation
sNearline j
ij=uiv
NEARLINE
V
Models
Machine
Learning
Algorithm
Aui=b
Computation
Netflix.Manhattan
sij
User Event
Queue
Event Distribution
sij>t
Algorithm
Service
Online
Data Service
UI Client
ONLINE
Play, Rate,
Browse...
Recommendations
Online
Computation
Machine
Learning
Algorithm
Member
35
36. Netflix Manhattan
Stan Lanning
Event-based precomputation framework
Supports both nearline and offline computation modes
Customer-centric events and data
Play
Service
Rating
Service
Event
Queue
Event
Event
Event
Handler
Handler
Handler
Request
Queue
…
Event
Rules
Manager
Manager
Manager
Algorithm
Algorithm
Algorithm
Cached
User Data
36
37. OFFLINE
Netflix.Hermes
Query results
Offline Data
Signals & Models
Machine
Learning
Algorithm
Model
training
Offline
Computation
Nearline
Computation
NEARLINE
Models
Machine
Learning
Algorithm
Netflix.Manhattan
Similar pattern across layers
User Event
Queue
Offline Data
Event Distribution
Algorithm
Service
Online
Data Service
UI Client
ONLINE
Models
Previously processed and
stored information
Online
Computation
Machine
Learning
Algorithm
Netflix.Hermes
Offline
Computation
Nearline
Computation
Models
Machine
Learning
Algorithm
Online
Computation
Signals
Fresh data from live services
User-related or context-related
Recommendations
Member
Parameter files
Trained offline
Data
Play, Rate,
Browse...
Signals
(Online Service)
Machine
Learning
Algorithm
37
38. OFFLINE
Netflix.Hermes
Query results
Offline Data
Recommendation Results
Machine
Learning
Algorithm
Model
training
Offline
Computation
Nearline
Computation
NEARLINE
Models
Machine
Learning
Algorithm
Netflix.Manhattan
Precomputed results
User Event
Queue
Event Distribution
ONLINE
Fetch from data store
Collect signals, apply model
Combination
Dynamically choose
Online
Data Service
Play, Rate,
Browse...
Recommendations
Online
Computation
Machine
Learning
Algorithm
Member
Post-process in context
Generated on the fly
Algorithm
Service
UI Client
Algorithm
Service
Machine
Learning
Algorithm
Online
Computation
UI Client
Recommendations
Fallbacks
Member
38
41. Take Aways
Behind-the-scenes peek at a real-world, industrial-scale
recommender system
Recommendation is not just ratings
Scaling is not only about batch, offline algorithms
Use application domain advantages
41