2. WHERE’S BIG LEARNING?
Next: Application Layer
Analytics
Machine Learning
Applications
Like Apache Mahout
Common Big Data app today Processing
Clustering, recommenders,
classifiers on Hadoop Database
Free, open source; not mature
Where’s commercialized Storage
Big Learning?
3. A RECOMMENDER SHOULD …
Answer in Real-time Accept Diverse Input
Ingest new data, now Not just people and products
Modify recommendations based Not just explicit ratings
on newest data Clicks, views, buys
No “cold start” for new data
Side information
Scale Horizontally Be “Pretty Accurate”
For queries per second
For size of data set
4. NEED: 2-TIER ARCHITECTURE
Real-time Serving Layer
Quick results based on
precomputed model
Incremental update
Partitionable for scale
Batch Computation Layer
Builds model
Scales out (on Hadoop?)
Asynchronous, occasional,
long-lived runs
5. A PRACTICAL ALGORITHM
MATRIX FACTORIZATION BENEFITS
Factor user-item matrix to Models intuition
user-feature + feature-item Factorization is batch
matrix parallelizable
Well understood in ML, as: Reconstruction (recs) in
Principal Component Analysis low-dimension is fast
Latent Semantic Indexing
Allows projection of new data
Several algorithms, like: Cold start solution
Singular Value Decomposition Approximate update solution
Alternating Least Squares
6. A PRACTICAL IMPLEMENTATION
ALTERNATING LEAST
SQUARES BENEFITS
Simple factorization P ≈ X YT Parallelizable by row --
Approximate: X, Y are very Hadoop-friendly
“skinny” (low-rank) Iterative: OK answer fast,
Faster than the SVD refine as long as desired
Trivially parallel, iterative Yields to “binary” input model
Dumber than the SVD Ratings as regularization
instead
No singular values,
Sparseness / 0s no longer a
orthonormal basis
problem
7. ALS ALGORITHM 1
Input: (user, item, strength) 1 4 3
tuples
3
Anything you can quantify is
input 4 3 2
Strength is positive 5 2 3
Many tuples per user-item 5
R is sparse user-item 2 4 R
interaction matrix
rij = total strength of
interaction between user i
and item j
8. ALS ALGORITHM 2
Follow “Collaborative 1 1 1 0 0
Filtering for Implicit
0 0 1 0 0
Feedback Datasets”
www2.research.att.com/~yifanhu/PUB/cf. 0 1 0 1 1
pdf
1 0 1 0 1
Construct “binary” matrix P
0 0 0 1 0
1 where R > 0
1 1 0 0 0 P
0 where R = 0
Factor P, not R
R returns in regularization
Still sparse; implicit 0s fine
9. ALS ALGORITHM 3
P is m x n
Choose k << m, n
Factor P as Q = X YT, Q ≈ P
X is m x k ; YT is k x n YT
Find best approximation Q
Minimize L2 norm of diff: || P-Q X
||2
Minimal squared error:
“Least Squares”
Recommendations are
largest values in Q
10. ALS ALGORITHM 4
Optimizing X, Y
simultaneously is non-
convex, hard
If X or Y are fixed, system of
YT
linear equations:
convex, easy
Initialize Y with random X
values
Solve for X
Fix X, solve for Y
Repeat (“Alternating”)
11. ALS ALGORITHM 5
Define regularization weights cui = 1 + α rui
Minimize:
Σ cui(pui – xuTyi)2 + λ(Σ||xu||2 + Σ||yi||2)
Simple least-squares regression objective, plus
Weighted least-squared error terms by strength,
a penalty for not reconstructing 1 at “strong” association is higher
Standard L2 regularization term
12. ALS ALGORITHM 6
With fixed Y, compute optimal X
Each row xu is independent
Define Cu as diagonal matrix of cu (user strength weights)
xu = (YTCuY + λI)-1 YTCupu
Compare to simple least-squares regression solution (YTY)-1 YTpu
Adds Tikhonov / ridge regression regularization term λI
Attaches cu weights to YT
See paper for how YTCuY is computed efficiently;
skipping the engineering!
14. FOLD-IN
Need immediate, if Note (YTY)(YTY)-1 = I
approximate, updates for Gives YT’s right inverse:
new data YT (Y(YTY)-1) = I
New user u needs new row Xu = Qu Y(YTY)-1
Qu = Xu YT
Xu ≈ Pu Y(YTY)-1
We have Pu ≈ Qu
Recommend as usual:
Compute Xu via right inverse: Qu = XuYT
X YT(YT)-1 = Q(YT)-1 so:
For existing user, instead
X = Q(YT)-1
add to existing row Xu
What is (YT)-1?
15. THIS IS MYRRIX
Soft-launched
Serving Layer available
as open source download
Computation Layer available
as beta
Ready on Amazon EC2 / EMR
srowen@myrrix.com
Full launch Q4 2012
myrrix.com
17. EXAMPLES
STACKOVERFLOW TAGS WIKIPEDIA LINKS
Recommend tags to Recommend new linked
questions articles from existing links
Tag questions automatically, Propose missing, related
improve tag coverage links
3.5M questions x 30K tags 2.5M articles x 1.8M articles
4.3 hours x 5 machines on 28 hours x 2 PCs on
Amazon EMR Apache Hadoop 1.0.3
$3.03 ≈ $0.08 per 100,000
recs