Recommendations @ Rakuten Group

Recommendations @ Rakuten Group
RecSys 2015/12/01
Vincent Michel & David Mas
vincent.michel@rakuten.com & mas.david@rakuten.com
Big Data Europe, Big Data Department, Rakuten Inc. / Big Data, PriceMinister
1

Presentation Overview
2
§  Rakuten Group
§  Recommendations Challenges
•  Challenges of Recommendations @ Rakuten
•  Items Catalogues and Similarities
•  Exploring Recommendations Models
•  Recommendations Evaluation and Public Initiatives
§  Conclusion

Rakuten Group Worldwide
3
Recommendation
challenges
q Different languages
q Users behavior
q Business areas

Rakuten Group in Numbers
4
Rakuten in Japan
q > 12.000 employees
q > 48 billions euros of GMS
q > 100.000.000 users
q > 250.000.000 items
q > 40.000 merchants
Rakuten Group
q Kobo 18.000.000 users
q Viki 28.000.000 users
q Viber 345.000.000 users

Rakuten Ecosystem
5
•  Rakuten global ecosystem :
l Member-based business model that connects Rakuten services
l Rakuten ID common to various Rakuten services
l Online shopping and services;
Main business areas
q E-commerce
q Internet finance
q Digital content
Recommendation challenges
q Cross-services
q Aggregated data
q Complex users features

Rakuten’s e-commerce: B2B2C Business Model
6
•  Business to Business to Consumer:
l Merchants located in different regions / online virtual shopping mall
l Main profit sources
•  Fixed fees from merchants
•  Fees based on each transaction and other service
Recommendation
challenges
q Many shops
q Items references
q Global catalog

Big Data Department @ Rakuten
7
Big Data Department
150+ engineers – Japan / Europe / US
Missions
q Development and operations of internal
systems for:
q Recommendations
q Search
q Targeting
q User behavior tracking
Average traffic
q > 100.000.000 events / day
q > 40.000.000 items view / day
q > 50.000.000 search / day
q > 750.000 purchases / day
Technology stack
q Java / Python / Ruby
q Solr / Lucene
q Cassandra / Couchbase
q Hadoop / Hive / Pig
q Redis / Kafka

8
§  Conclusion

Recommendations on Rakuten Marketplaces
9
9
Non-personalized recommendations
q All-shop recommendations:
q Item to item
q User to item
q In-shop recommendations
q Review-based recommendations
Personalized recommendations
q Purchase history recommendations
q Cart add recommendations
q Order confirmation recommendations
System status and scale
q In production in over 35 services of Rakuten Group worldwide
q Several hundreds of servers running:
q Hadoop
q Cassandra
q APIS

Challenges in Recommendations
10
10
Items
Catalogue
Items
Similarity
Recommendations
engine

Evaluation
Process

•  Items catalogues
l Catalogue for multiple shops with different items references ?
•  Items similarity / distances
l Cross services aggregation ?
l Lots of parameters ?
•  Recommendations engine
l Best / optimal recommendations logic ?
•  Evaluation process
l Offline / online evaluation ?
l Long-tail ? KPI ?

Recommendations Architecture: Constantly Evolving
11
11
Browsing

Events
Cocounts

Storage

Purchase

Events
Catalogue(s)
Distribu9on
layer

Recommendations
Offline / materialized
Recommendations
Online algebra / multi-arm

12
§  Conclusion

Items Catalogues
13
13
Use different levels of aggregation to improve recommendations
Category-level
(e.g. food, soda, clothes, …)
Product-level
(manufactured items)
Item in shop-level
(specific product sell by a
specific shop)
Increased statistical power
in co-events computation
Easier business handling
(picking the good item)

Enriching Catalogues using Record Linkage
14
Record linkage

q Use external sources (e.g., Wikidata) to
align markets' products

q Fuzzy matching of 600K vs 350K items
for movies alignments usecase.

q Blocking algorithm

Cross recommendation

q Global catalog

q Items aggregation

q Helps with cold start issues
q Improved navigation

Marketplace
2
Marketplace
1
Reference
database

Co-occurrences and Similarities Computation
15
Multiple possible parameters:
l  Size of time window to be considered:
Does browsing and purchase data reflect similar behavior ?
l  Threshold on co-occurrences
Is one co-occurrence significant enough to be used ? Two ? Three ?
l  Symmetric or asymmetric
Is the order important in the co-occurrence ? A then B == B then A ?
l  Similarity metrics
Which similarity metrics to be used based on the co-occurrences ?
Only access to unitary data (purchase / browsing)
Use co-occurrences for computing items similarity

Co-occurrences Example
16
Browsing
Purchase
Session
?
Session
?
Time window 1
Session
?
Time window 2
07/11/2015
08/11/2015

08/11/2015

24/11/2015

08/11/2015

08/11/2015

10/09/2015

08/09/2015
10/09/2015

Co-occurrences Computation
17
Co-‐purchases
Co-‐browsing
Classical co-occurrences

Complementary

items
Subs9tute

items
Other possible co-occurrences

Items
browsed
and

bought
together

Items
browsed
and

not
bought
together

“You
may
also
want…”

“Similar
items…”

08/11/2015

08/11/2015

08/11/2015

07/11/2015

08/11/2015
10/09/2015

08/09/2015
07/11/2015

18
§  Conclusion

Recommendations Algebra
19
Keys ideas
l  Reuse already existing logics and combine them easily.
l  Write business logic, not code !
l  Handle multiple input/output formats.
Algebra for defining and combining recommendations engines
19
Available Logics
q Content-based
q Collaborative-filtering
q Item-item
q User-item (personalization)
Available Backends
q In-memory
q HDF5 files
q Cassandra
q Couchbase
Available Hybridization
q Linear algebra / weighting
q Mixed
q Cascade engines
q Meta-level

Python Algebra Example
20
>>> engine1 = RecommendationsEngine(nb_recos=20, datatype=‘purchase’, !
asymmetric=True, !
distance=‘conditional_probability’)!
>>> engine2 = RecommendationsEngine(similarity_th=0.01, datatype=‘browsing’, !
asymmetric=False, !
! ! ! distance=‘cosine_similarity’)!
>>> composite_engine = engine1 + 0.2 * engine2!
Get recommendations from items (item-to-item)
!
>>> recos = composite_engine.recommendations_by_items([123, 456, 789, …])!
20
Purchase-based
Top-20
Asymmetric
Conditional probability
Browsing-based
Similarity > 0.01
Symmetric
Cosine similarity
+
0.2
Composite engine

Python Algebra with Personalization
21
>>> history = HistoryEngine(datatype=‘purchase’, time_window=180, time_decay=0.01)!
>>> engine1.register_history_engine(history)!
…same code as previously (user-to-item)!
!
>>> recos = composite_engine.recommendations_by_user(‘userid’)!
21
Purchase-based
Top-20
Asymmetric
Browsing-based
Similarity > 0.01
Symmetric
Cosine similarity
+
0.2
Composite engine
Purchase-history
Time window 180 days
Time decay 0.01

Python Algebra – Complete Example
22
22
Purchase-based
Top-20
Asymmetric
Browsing-based
Similarity > 0.01
Symmetric
Cosine similarity
+
0.2
Composite engine
Purchase-history
Time window 180 days
Time decay 0.01
X
(cascade)

Purchase-based
Category-level
Similarity > 0.01
Asymmetric
Browsing-based
Category-level
Similarity > 0.1
Symmetric
Cosine similarity
+
0.1

Composite engine

23
•  Recommendations Evaluation and Public
Initiatives
§  Conclusion

Recommendation Quality Challenges
24
Minor

Product

Major

Product

(Popular)

New

Product

Old

Product

(A)

(B)

(D)

(C)

Recommendations categories

•  Cold start issue
•  External data ?
•  Cross-services ?
•  Hot products (A)
•  Top-N items ?
•  Short tail (B)
•  Long tail (C + D)

Long Tail is Fat
25
Long tail numbers

•  Most of the items are long tail
•  They still represent a large portion of the
traffic

Popular

Short

tail

Long

tail

Browsing
share
Number
of
items

Long
tail
Short
tail
Popular

Long tail approaches

•  Content-based
•  Aggregation / clustering
•  Personalization

Evaluation
26
Browsing

History

Query

History

Purchase

History

Algorithms

Datasets

Offline Test

Long-term Research

Online Test

KPI Maximization

Use as prior

Correlation between
offline metrics & value

Hybrid approach
q Offline for Long-Term and Prior
q Online for Short-Term and Maximizing KPI’s

Offline Evaluation
27
Pros/Cons

•  Convenient way to try new ideas
•  Fast and cheap
•  But hard to align with online KPI
Approaches

•  Rescoring
•  Prediction game
•  Business simulator
Target
=
item
bought
by
user

Offline Evaluation for Online Learning
28

Public Initiative – Viki Recommendation Challenge
567 submissions from 132 participants
http://www.dextra.sg/challenges/rakuten-viki-video-challenge
29

30
§  Conclusion

Conclusion
31
Items catalogue: reinforce statistical power of co-occurrences across shops and
services;
Items similarities: find the good parameters for the different use-cases;
Recommendations models: what is the best models for in-shop, all-shops,
personalization?
Evaluation: handling long-tail? Comparing different models?
Rakuten provides marketplaces worldwide
Specific challenges for recommendations

We are Hiring!
32
Data Scientist / Software Developer
Ø  Build algorithms for recommendations, search, targeting
Ø  Predictive modeling, machine learning, natural language processing
Ø  Working close to business
Ø  Python, Java, Hadoop, Couchbase, Cassandra…
Ø  Also hiring: search engine developers, big data system administrators, etc.

Big Data Department – team in Paris
http://global.rakuten.com/corp/careers/bigdata/
http://www.priceminister.com/recrutement/?p=197

33
THANKS !
Questions ?
More on Rakuten tech initiatives
http://www.slideshare.net/rakutentech
http://rit.rakuten.co.jp/oss.html
http://rit.rakuten.co.jp/opendata.html
Positions
•  http://global.rakuten.com/corp/careers/bigdata/
•  http://www.priceminister.com/recrutement/?p=197

Recommendations @ Rakuten Group

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Recommendations @ Rakuten Group

Similaire à Recommendations @ Rakuten Group (20)

Plus de recsysfr

Plus de recsysfr (17)

Dernier

Dernier (20)

Recommendations @ Rakuten Group