Contenu connexe Similaire à Building Content Recommendation Systems Using Apache MXNet and Gluon - MCL402 - re:Invent 2017 (20) Plus de Amazon Web Services (20) Building Content Recommendation Systems Using Apache MXNet and Gluon - MCL402 - re:Invent 20171. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
Building Content Recommendation
Systems Using Apache MXNet and Gluon
C y r u s M V a h i d
c y r u s m v @ a m a z o n . c o m
P r i n c i p a l D e e p L e a r n i n g S o l u t i o n A r c h i t e c t @ A m a z o n A I
D e c e m b e r 1 , 2 0 1 7
2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
My Profile – amazon.co.uk My Profile – amazon.de
Motivation
3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Recommender systems are an important
mechanism to personalize and enhance customer
experience.
• Recommender systems can be applied to achieve
different goals:
• Increased time spent on a platform
• Bring to the attention of a customer
that an item might need complementary
pieces.
• In the heart of it all, the intention is to provide
the person interacting with the system the most
relevant results and content to that person.
Motivation
4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Recommender System
Collaborative Filtering Using Matrix Factorization
Step by Step Coding
Challenges
Deep Structure Semantic Models (DSSM)
Production Notes
C o m m o n A p p r o a c h e s t o B u i l d a
5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• User based recommenders try to identify a short
list of other users who are "similar" to you who
have rated this new item. They then take a
weighted average of the rating for the item from
these users and predict that number as your likely
rating for that item.
• Item based recommenders try to identify a short
list of other items that are "similar" to the item
under question and take a weighted average of
the ratings for those top few items which you
provided and predict that number as your likely
rating for that item.
Collaborative Filtering
https://www.oreilly.com/ideas/deep-matrix-factorization-using-apache-mxnet?cmp=tw-data-na-article-engagement_sponsored+kibird
6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Collaborative Filtering
7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• User-Item matrices are very sparse. For instance
amazon.com provides a catalogue of millions of
items, but each user has rated only a handful of
items.
• Intention of a recommendation engine is to
complete the sparse matrix based on the known
data.
• Matrix Factorization factorizes a matrix to
separate matrices, that when multiplied
approximate to the completed matrix.
• For the sake of efficiency we would like to
factorize the matrix to a long and a wide matrices.
Matrix Factorization
https://www.oreilly.com/ideas/deep-matrix-factorization-using-apache-mxnet?cmp=tw-data-na-article-engagement_sponsored+kibird
8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
def plain_net(k):
# input
user = mx.symbol.Variable('user')
item = mx.symbol.Variable('item')
score = mx.symbol.Variable('score')
# user feature lookup
user = mx.symbol.Embedding(data = user, input_dim = max_user, output_dim = k)
# item feature lookup
item = mx.symbol.Embedding(data = item, input_dim = max_item, output_dim = k)
# predict by the inner product, which is elementwise product and then sum
pred = user * item
pred = mx.symbol.sum(data = pred, axis = 1)
pred = mx.symbol.Flatten(data = pred)
# loss layer
pred = mx.symbol.LinearRegressionOutput(data = pred, label = score)
return pred
net1 = plain_net(64)
mx.viz.plot_network(net1)
Linear Model
User
Embedding
Item
Embedding
∘
⨀⨁ score
output
9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
def get_one_layer_mlp(hidden, k):
# input
user = mx.symbol.Variable('user')
item = mx.symbol.Variable('item')
score = mx.symbol.Variable('score')
# user latent features
user = mx.symbol.Embedding(data = user, input_dim = max_user, output_dim = k)
user = mx.symbol.Activation(data = user, act_type='relu')
user = mx.symbol.FullyConnected(data = user, num_hidden = hidden)
# item latent features
item = mx.symbol.Embedding(data = item, input_dim = max_item, output_dim = k)
item = mx.symbol.Activation(data = item, act_type='relu')
item = mx.symbol.FullyConnected(data = item, num_hidden = hidden)
# predict by the inner product
pred = user * item
pred = mx.symbol.sum(data = pred, axis = 1)
pred = mx.symbol.Flatten(data = pred)
# loss layer
pred = mx.symbol.LinearRegressionOutput(data = pred, label = score)
return pred
Adding Non-Linearity
User
Embedding
Item
Embedding
∘
⨀⨁ score
output
dense dense
dense dense
10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Embedding Layer is where a networks extracts
importance of features from data.
• Embedding is frequently used in NLP. For instance
in sentiment analysis embedding distills sentiment
information form words.
Embedding Layer
https://www.oreilly.com/ideas/deep-matrix-factorization-using-apache-mxnet?cmp=tw-data-na-article-engagement_sponsored+kibird
11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Recommender System
Collaborative Filtering Using Matrix Factorization
Step by Step Coding
Challenges
Deep Structure Semantic Models (DSSM)
Production Notes
C o m m o n A p p r o a c h e s t o B u i l d a
12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Coding Steps in gluon
• Loading Data into an Array Iterator
• Defining the Network
• Initializing the Network
• Choosing the Loss Function
• Choosing Optimizer
• Training the Model
• Measuring Performance
• Adding Non-linearity
• Measuring Performance
13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
train_data_iter = gluon.data.DataLoader(SparseMatrixDataset(train_data, train_label),
shuffle=True, batch_size=batch_size)
test_data_iter = gluon.data.DataLoader(SparseMatrixDataset(test_data, test_label),
shuffle=True, batch_size=batch_size)
class SparseMatrixDataset(gluon.data.Dataset):
def __init__(self, data, label):
assert data.shape[0] == len(label)
self.data = data
self.label = label
if isinstance(label, ndarray.NDArray) and len(label.shape) == 1:
self._label = label.asnumpy()
else:
self._label = label
def __getitem__(self, idx):
return self.data[idx, 0], self.data[idx, 1], self.label[idx]
def __len__(self):
return self.data.shape[0]
Loading Data into an Array Iterator
14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
class MFBlock(gluon.Block):
def __init__(self, max_users, max_items, num_emb, dropout_p=0.5):
super(MFBlock, self).__init__()
self.max_users = max_users
self.max_items = max_items
self.dropout_p = dropout_p
self.num_emb = num_emb
with self.name_scope():
self.user_embeddings = gluon.nn.Embedding(max_users, num_emb)
self.item_embeddings = gluon.nn.Embedding(max_items, num_emb)
def forward(self, users, items):
a = self.user_embeddings(users)
b = self.item_embeddings(items)
predictions = a * b
predictions = nd.sum(predictions, axis=1)
return predictions
Defining the Network
15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx, force_reinit=True)
Initializing the Network
16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
loss_function = gluon.loss.L2Loss()
Choosing the Loss Function
17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': lr, 'wd': wd, 'momentum':
0.9})
Choosing Optimizer
18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
pochs = 10
def train(data_iter, net):
for e in range(epochs):
print("epoch: {}".format(e))
for i, (user, item, label) in enumerate(train_data_iter):
user = user.as_in_context(ctx).reshape((batch_size,))
item = item.as_in_context(ctx).reshape((batch_size,))
label = label.as_in_context(ctx).reshape((batch_size,))
with mx.autograd.record():
output = net(user, item)
loss = loss_function(output, label)
loss.backward()
net.collect_params().values()
trainer.step(batch_size)
print("EPOCH {}: RMSE ON TRAINING and TEST: {}. {}".format(e,
eval_net(train_data_iter, net)),
eval_net(test_data_iter, net)))
return output
Training the Model
19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EPOCH 0: RMSE ON TRAINING and TEST: 3.5231035657055676. 3.53353935778141
EPOCH 1: RMSE ON TRAINING and TEST: 3.3170668120235205. 3.3881871127337218
EPOCH 2: RMSE ON TRAINING and TEST: 1.7826270855292679. 2.011566595607996
EPOCH 3: RMSE ON TRAINING and TEST: 1.1699977076664567. 1.3101321067839862
EPOCH 4: RMSE ON TRAINING and TEST: 0.9599943867214024. 1.0654756610274314
EPOCH 5: RMSE ON TRAINING and TEST: 0.8661853047579527. 0.951712748876214
EPOCH 6: RMSE ON TRAINING and TEST: 0.8170913911752403. 0.8906555083304644
EPOCH 7: RMSE ON TRAINING and TEST: 0.7873748381830752. 0.8524201203614473
EPOCH 8: RMSE ON TRAINING and TEST: 0.7683705289691687. 0.8290898406863213
EPOCH 9: RMSE ON TRAINING and TEST: 0.7545523364305496. 0.8126902643978595
Measuring Performance
20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
class MFBlock(gluon.Block):
def __init__(self, max_users, max_items, num_emb, dropout_p=0.5):
super(MFBlock, self).__init__()
self.max_users = max_users
self.max_items = max_items
self.dropout_p = dropout_p
self.num_emb = num_emb
with self.name_scope():
self.user_embeddings = gluon.nn.Embedding(max_users, num_emb)
self.item_embeddings = gluon.nn.Embedding(max_items, num_emb)
self.dense = gluon.nn.Dense(num_emb, activation='relu')
def forward(self, users, items):
a = self.user_embeddings(users)
a = self.dense(a)
b = self.item_embeddings(items)
b = self.dense(b)
predictions = a * b
predictions = nd.sum(predictions, axis=1)
return predictions
Adding Non_Linearity
21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EPOCH 0: RMSE ON TRAINING and TEST:
0.7397429539039732. 0.7727593539834022
EPOCH 1: RMSE ON TRAINING and TEST:
0.7340302160739899. 0.7680136477231979
EPOCH 2: RMSE ON TRAINING and TEST:
0.7290934163123369. 0.7636049427747726
EPOCH 3: RMSE ON TRAINING and TEST:
0.7489639871120453. 0.7812525823950768
EPOCH 4: RMSE ON TRAINING and TEST:
0.7219638488218189. 0.7587113852620124
EPOCH 5: RMSE ON TRAINING and TEST:
0.7221305305734277. 0.7613962271451951
EPOCH 6: RMSE ON TRAINING and TEST:
0.6964019835107028. 0.7500802919447422
EPOCH 7: RMSE ON TRAINING and TEST:
0.7040959010727703. 0.7654881411790848
EPOCH 8: RMSE ON TRAINING and TEST:
0.662088219345361. 0.7430168891996145
EPOCH 9: RMSE ON TRAINING and TEST:
0.641668664816767. 0.7430033217519522
Measuring Performance
EPOCH 0: RMSE ON TRAINING and TEST:
3.5231035657055676. 3.53353935778141
EPOCH 1: RMSE ON TRAINING and TEST:
3.3170668120235205. 3.3881871127337218
EPOCH 2: RMSE ON TRAINING and TEST:
1.7826270855292679. 2.011566595607996
EPOCH 3: RMSE ON TRAINING and TEST:
1.1699977076664567. 1.3101321067839862
EPOCH 4: RMSE ON TRAINING and TEST:
0.9599943867214024. 1.0654756610274314
EPOCH 5: RMSE ON TRAINING and TEST:
0.8661853047579527. 0.951712748876214
EPOCH 6: RMSE ON TRAINING and TEST:
0.8170913911752403. 0.8906555083304644
EPOCH 7: RMSE ON TRAINING and TEST:
0.7873748381830752. 0.8524201203614473
EPOCH 8: RMSE ON TRAINING and TEST:
0.7683705289691687. 0.8290898406863213
EPOCH 9: RMSE ON TRAINING and TEST:
0.7545523364305496. 0.8126902643978595
22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Recommender System
Collaborative Filtering Using Matrix Factorization
Step by Step Coding
Challenges
Deep Structure Semantic Models (DSSM)
Production Notes
C o m m o n A p p r o a c h e s t o B u i l d a
23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Matrix Factorization is ideal for small catalogues
and can perform based on small amount of data.
• As the catalogues get larger memory becomes a
challenge.
• For instance MovieLens 20M has 27278 items and
138493 users. User x Item matrix will have a
dimension of 138,493 X 27,278 = 3,777,812,054.
From all possible ratings the dataset includes only
20000263. This means only 0.05 percent of the
dataset contains data and 99.95 percent is just
sparsity.
Limitations
ref: Leo Dirac – re:Invent 2016
24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Factorization Machines take advantage of
combining Support Vector Machines and MF in
order to scale and deal with sparsity.
• The problem FM is that they run on a single
machine and have huge memory requirements
Scaling Is the Issue…
• DiFacto or Distributed Factorization Machines are
capable of distributing computation of sub-
gradients on minibatches asynchronously and thus
distribution load to several machines.
(Alex Smola et al. - 2016)
parameter server asynch SGD
25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• New users have no previous raring history.
• New items have never been rated before.
• Implementing recommendation without having
any prior information for platforms that no user-
login has been required.
Cold-Start Problem
itm0 itm1 itm2 itm3 itm4 itm5 itm6 itm7 itm8 itm9 itm10
u0 2 ? ? ? ? ? 4 ? ? ? ?
u1 ? 2 3 ? 4 ? ? 3 ? 5 ?
u2 2 ? ? ? 4 ? 4 ? ? ? ?
u3 1 ? ? 1 ? ? ? ? 1 ? ?
u4 5 ? ? 2 ? ? ? ? ? 2 ?
u5 ? ? 3 ? ? ? ? ? ? 5 ?
u6 ? 1 ? ? ? ? 4 ? 1 ? ?
u7 ? ? ? ? ? ? 1 ? 2 3 ?
u8 ? ? ? 5 ? 3 ? ? ? ? ?
u9 ? 2 3 ? 3 ? ? 3 ? 5 ?
u10 ? ? ? ? ? ? ? ? ? ? ?
26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• We might not have user data, but often there is a
wealth of product information available. We can
use this data in order to recommend similar
products to a user.
Content (Feature)-Based
code category sub category weight price colour dimentions
itm1 1 1 20 123.1 blue 23x12x19
itm2 2 1 20 900 white 23x12x20
itm3 1 1 22 123.1 green 20x10x18
itm4 3 1 20 600 blue 23x12x22
itm5 1 2 19 200 yellow 23x12x23
itm6 1 1 1 12 red 2x1x3
itm7 1 1 900 2000 blue 100x80x99
itm8 9 8 20 6000 grey 99x99x99
itm9 7 5 1000 123.1 blue 123x5x8
itm10 9 8 20 5000 brown 99x99x99
27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Content Based
28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Hybrid Models combine item/user-based models
with content based.
• After searching for travel guides for Peru,
Argentina, and Brasil and while being in Germany,
I received the recommendation in the opposite
pane.
• The recommender has combined my location,
offered me a South American travel guide, and
has given more weight to a specific publisher that
I deliberately clicked-on more frequently.
Hybrid Models
29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Recommender System
Collaborative Filtering Using Matrix Factorization
Step by Step Coding
Challenges
Deep Structure Semantic Models (DSSM)
Production Notes
C o m m o n A p p r o a c h e s t o B u i l d a
30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• There is still a wealth of information we have not
tapped into.
• Movies have images.
• Images can be captioned.
• Products have names, while we have so
far reduced them to item numbers.
• TV series have episode names.
• Products have verbose reviews.
• We can harvest all of this information and enrich
our recommendation system.
Most of the Data Is Still Untapped
There is a strong similarity in the ambience and
composition of these images:
31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• A Matrix Factorization solution in its core is
multiplication of two matrices.
• Neural Networks are good at picking up semantic
intent at phrase/sentence level.
• Neural Networks perform well at image
captioning.
• Output of network is a tensor.
• So we can use output of several networks as our
embedding layer for an enriched recommendation
system.
DSSM
User
Embedding
Item
Embedding
∘
⨀⨁ score
output
32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
DSSM
User
Embedding
Item
Embedding
⨀
⨀⨁ score
output
user Search
BOW
title words
BOW
resnet: imgs
dropout
dense densedensedensedense
concat concat
densedense
33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Recommender System
Collaborative Filtering Using Matrix Factorization
Step by Step Coding
Challenges
Deep Structure Semantic Models (DSSM)
Production Notes
C o m m o n A p p r o a c h e s t o B u i l d a
34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Development and Deployment
• One of the most important areas to pay attention to is the loss function.
• Multi-label cross-entropy loss has worked well in the past.
• This is relatively simple to apply to a wide variety of model types. Ranking loss often works better, but is
more complex to apply correctly.
• Scalability is always a big challenge. Offline batch computation and saving the results can help.
35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Logging and Measurement
• Deploying a recommender system requires some care since a model only succeeds if good behavioral data
can be logged.
• Moreover, without good logging it is impossible to assess the quality of the deployment. Tools such as
Amazon Kinesis are ideally suited for this purpose.
• Display bias is very strong – this means that customers are more likely to click on a mediocre
recommendation that they see than an excellent recommendation they don’t see.
• An improved model might not outperform the baseline when evaluated on historical data (and a model
that performs well on historical data might not work well on future data).
• Online comparisons are critical. This suggests an iterative strategy to building out a recommender system,
starting with a relatively simple model that is easy to deploy, and iteratively testing improvements.
36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Metrics
• Some common and effective metrics to measure are
• Recall: How many of the relevant items were selected.
• Precision: How many items were selected – Optimize for precision and for top x.
• Try to increase the number of customers that viewed a product with some frequency.
• Weblab: take half of the customers and check the end-result.
• Time spent on the platform
• Distinct days on platform
• Forecast customer score
37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How to Choose a Model
Iterative
Process
à à à à
Data Available Limited user
data
Binary user-item
interaction
Limited user
data
Additional user-
item interaction
User data
Extensive item
data
Extensive user
data
Extensive item
data
Relevant
Algorithms
MF
Binary
MF
FM
DiFacto
DSSM Customized and
more advanced
DSSM
Relative
Complexity
2 4 5 5
Deployment
Considerations
• Historical data size – 30d/60d/1y…
• Fine-tuning techniques (daily, weekly..)
• Inference – compressed model? Trade-off between model complexity
and inference latency
• Validation system setup
• Iterate fast and simple
38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
References
• https://www.cs.umd.edu/~samir/498/Amazon-Recommendations.pdf
• http://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf
• https://www.cs.cmu.edu/~muli/file/difacto.pdf
• http://papers.www2017.com.au.s3-website-ap-southeast-2.amazonaws.com/proceedings/p173.pdf
• http://mccormickml.com/2017/01/11/word2vec-tutorial-part-2-negative-sampling/
• https://arxiv.org/pdf/1310.4546.pdf
• http://faculty.washington.edu/xiaohe/UW_DeepNLP_slides.pdf
• https://www.microsoft.com/en-us/research/publication/learning-deep-structured-semantic-models-for-
web-search-using-clickthrough-data/
• http://alex.smola.org/teaching/berkeley2012/recommender.html
• http://papers.www2017.com.au.s3-website-ap-southeast-
2.amazonaws.com/proceedings/p173.pdfhttp://www.simafore.com/blog/bid/207476/The-intuition-
behind-recommender-systems-collaborative-filtering
• https://arxiv.org/pdf/1312.4894.pdf
• http://www.simafore.com/blog/bid/207476/The-intuition-behind-recommender-systems-collaborative-
filtering
39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
c y r u s m v @ a m a z o n . c o m