2. ● Online marketplace
● Over 380k sellers
● The largest e-commerce in Poland
● 6th
largest e-commerce in EU
● ~ 15 million transactions monthly
● ~ 20 million accounts
● ~ 110 million offers (~ 20 million unique products)
2
4. Recommender systems
Collaborative-filtering (CF)
● Similarity models based on past
users’ behaviour
● Two main subtypes:
○ item-to-item
○ user-to-item
● Model built either on explicit or
implicit feedback
Content-based (CB)
● Based on product content
○ Description
○ Meta-data
○ Images
4
5. Recommender systems
Which is better?
● In general, CF are better
● Sometimes we explicitly want
content-based recommendations
○ Fashion
● Not always can be applied
○ Cold-start scenarios
○ Data sparsity
Hybrid recommender systems
● Where possible use CF
● Where not use CB
● Can we do better?
5
6. Recommendations @ allegro
● Automatic recommendations generate 10% of allegro’s
GMV (total sales).
● 33 different recommendation scenarios / different
placements
● Available on
6
🖥 📱 ✉
11. Generic framework
Main components
● Training data preparation
● Representation learning module
● Nearest-neighbour search in latent space
● Serving
11
12. Training data preparation
● Different scenario -> different logic -> different data
● Collaborative filtering
○ Sequences of visited product ids from user sessions
○ Purchases or carts
● Content-based recommendations
○ Product title with concatenated category path
○ Product images
● We use Scio - a Scala wrapper around Apache Beam Java API
○ Concise syntax
○ Good testability
12
15. How to obtain word embeddings?
W W WW
the
W
sat the
sat
Embedding matrix
Sum/Average/Concatenate
Prediction (Classifier)
cat on the
cat on the
dict_size
emb_dims
The cat sat on the mat
[Mikolov et al., Distributed Representations of Words and Phrases and their Compositionality, NIPS ’13]
15
16. How to obtain product embeddings?
W W WW
id8
W
id5 id8
id5
Embedding matrix
Sum/Average/Concatenate
Prediction (Classifier)
id4 id9 id1
id4 id9 id1
dict_size
emb_dims
id8 id4 id5 id9 id1
[Grbovic et al., E-commerce in Your Inbox: Product Recommendations at Scale, KDD ’15]
16
17. Training CF-based representations
● Our model learns to predict surrounding (context) products given
the central product
● We rely on system-wide implicit feedback
○ We reduce the bias from past recommendations
○ We have much more training data
● Product embeddings are, in fact, side-effects of the training
process
17
18. 100D embeddings
Trained with skip-gram (word2vec)
on view-sessions
2D dimensionality reduction with t-SNE
Product embeddings - similarity
18
20. Training CF-based representations
● Different placements require different recommender logic
● On product pages, we want to show similar products
○ We train on sequences of product page views (ids)
● In cart and in post-buy e-mails we want to show complementary
products
○ We train on cart data
20
21. Training content-based representations
● Textual content
○ Word embeddings trained using fastText
○ Products represented as a weighted averages of title words
● Product images
○ We use Inception v3, pretrained on ImageNet, fine-tuned on Allegro’s products
21
23. Nearest neighbour search
● We could compute distances between a query product and all
remaining products and then sort products according to distances
● This is prohibitively expensive
● We need some approximation
● Often when searching we want high precision but we are willing to
compromise on recall
● False negatives are not a problem
23
24. Nearest neighbour search
● Approximate NN search
○ Searching in almost constant time
○ Random hyperplane projections
(aka SimHash or LSH)
● Two popular libraries
○ Annoy
○ Faiss
[Charikar, Similarity Estimation Techniques from Rounding Algorithms, STOC ’02]
24
25. Nearest neighbour search
● FAISS implements a two-stage searching strategy:
○ Vectors are clustered into relatively small number (e.g. 1000) of partitions
(aka Voronoi cells)
○ Each cluster is represented by a centroid
○ During search, a small number (e.g. 5) of cells closest to the query product
is selected
○ Exact search within those cells
● FAISS provide further approximations with more advanced
quantization techniques
25
[Jégou et al., Product Quantization for Nearest Neighbor Search, 2011]
26. Nearest neighbour search
● In case of CF-based embeddings we use single index for
all products
● In case of content-based embeddings we use separate
indices for each top category of product
● Filtering out too close neighbours
○ Both too close to the source and too close to each other
○ How far is too far (from source)?
● Eagerly precomputing neighbours for all the products
26
28. Orchestration
● For orchestration
○ Spotify Luigi
○ Built around the concept of a task
○ The task has one output and multiple inputs defined by targets
● Custom Luigi tasks and targets
○ Google ML Engine submit task
■ FAISS and gensim tasks
○ Google Dataflow submit task
28
30. Serving
● We serve pre-calculated item-to-item recommendations
○ We use key-value stores for recommendations
○ source_id -> [target_id_1, …, target_id_N]
● Performance at serving-time
○ Enrichment with product meta-data
○ No-longer-available products are excluded at serving-time
30
31. Re-ranking
● Learning-to-rank re-ranking is common in search engines
● A LambdaMART reranking improved allegro’s search results by
9% (NDCG)
● We experiment with learning-to-rank algorithms to improve quality
of recommendations
[Burges et al., From RankNet to LambdaRank to LambdaMART: An Overview, 2010]
31
33. Hybrid recommendations
● For majority of products we are unable to server CF recommendations
○ Item cold-start
○ Data sparsity
● We can recommend products based on content similarity. But is there a
better way?
33
34. Hybrid recommendations - our approach
● Find the NN in “content” space narrowed to “popular” items
● Serve CF-based recommendations for the neighbour
34
35. Hybrid recommendations - meta-prod2vec
● In classic prod2vec we simply use product ids as words
○ id1 id2 … idN
● In meta-prod2vec we interleave ids and meta-data:
○ id1 meta1 id2 meta2 … idN metaN
○ We also need to increase window size
● Improves quality of product representations
● We obtain metadata embeddings
○ Both product embeddings and metadata embeddings are in the same vector space
○ We can represent product as a combination of its metadata embeddings
[Vasile et al., Meta-Prod2Vec - Product Embeddings Using Side-Information for Recommendation, RecSys ’16]
35
37. User-to-item recommendations
● Item-to-item recommendations are crucial for Allegro
○ Offer pages - most visited part of the system
○ The aim is to shorten the Path-to-Purchase
● However, on the main page we need user-to-item recommendations
○ No explicit product context
○ The aim is to inspire users to make new purchases
● Also in e-mail campaigns we need user-to-items recommendations
37
38. User-to-item recommendations - approaches
1. Learn latent representation of the user
○ Obtained from a user-to-item interaction matrix
○ Impossible to retrain user representation in real-time
○ Good for e-mailing campaigns
2. User representation is an aggregation of representations of (e.g. visited)
products
○ Requires online NN search
○ Challenge: latency under heavy traffic
3. No user representation
○ Interleaved item-to-item recommendations of visited items
○ Easy to implement, fast to serve at runtime
○ Challenge: how to mix recommendations?
38
40. Hyperparameter tuning
● “Hyperparameters Matter”
● We hold-out some number of train sessions
○ We predict the last item in the session based on the penultimate item.
● MRR@25
● Google ML Engine
○ Convenient, but expensive
● Do-It-Yourself with hyperopt library
○ Much cheaper
[Caselles-Dupré et al., Word2vec applied to Recommendation: Hyperparameters Matter, RecSys ’18]
[Golovin et al., Google Vizier: A Service for Black-Box Optimization, KDD ’17]
40
41. Data challenges
● Allegro in an online marketplace, not an online store
● There are many sellers
● The same product can be offered by many sellers
● Huge effort to match offers to products
○ Still, most of the offers do not have a product id
● 6x more offers than users
41
42. Data challenges
● Calculating recommendations on a “per offer” basis introduces “noise”
○ “Accidental” co-occurrences become relatively strong signal
● Id dictionary gets very big (110 M offers)
○ Memory consumption, size of the model
● Interaction matrices get very sparse
42
43. Data challenges
● We try to cluster unmatched offers into pseudo-products
● By title, by category, by attributes
● We want less “objects” and a denser interaction matrix
● Product-based recommendations give us better results
○ +44% CTR
○ +34% GMV
● Choice of the product “representant” becomes important
○ How to choose the “best” offer for a product (cheapest, free delivery, etc) - ranking
algorithms
43