This document discusses recommendations challenges at Rakuten Group. It provides an overview of Rakuten Group, which has over 100 million users and 250 million items across its various ecommerce services. Key challenges for recommendations include handling different languages and business areas across services. The document also discusses approaches for building item catalogs, computing item similarities, exploring recommendation models using an algebra framework, and evaluating recommendations both offline and through public challenges.
1. Recommendations @ Rakuten Group
RecSys 2015/12/01
Vincent Michel & David Mas
vincent.michel@rakuten.com & mas.david@rakuten.com
Big Data Europe, Big Data Department, Rakuten Inc. / Big Data, PriceMinister
1
2. Presentation Overview
2
§ Rakuten Group
§ Recommendations Challenges
• Challenges of Recommendations @ Rakuten
• Items Catalogues and Similarities
• Exploring Recommendations Models
• Recommendations Evaluation and Public Initiatives
§ Conclusion
4. Rakuten Group in Numbers
4
Rakuten in Japan
q > 12.000 employees
q > 48 billions euros of GMS
q > 100.000.000 users
q > 250.000.000 items
q > 40.000 merchants
Rakuten Group
q Kobo 18.000.000 users
q Viki 28.000.000 users
q Viber 345.000.000 users
5. Rakuten Ecosystem
5
• Rakuten global ecosystem :
l Member-based business model that connects Rakuten services
l Rakuten ID common to various Rakuten services
l Online shopping and services;
Main business areas
q E-commerce
q Internet finance
q Digital content
Recommendation challenges
q Cross-services
q Aggregated data
q Complex users features
6. Rakuten’s e-commerce: B2B2C Business Model
6
• Business to Business to Consumer:
l Merchants located in different regions / online virtual shopping mall
l Main profit sources
• Fixed fees from merchants
• Fees based on each transaction and other service
Recommendation
challenges
q Many shops
q Items references
q Global catalog
7. Big Data Department @ Rakuten
7
Big Data Department
150+ engineers – Japan / Europe / US
Missions
q Development and operations of internal
systems for:
q Recommendations
q Search
q Targeting
q User behavior tracking
Average traffic
q > 100.000.000 events / day
q > 40.000.000 items view / day
q > 50.000.000 search / day
q > 750.000 purchases / day
Technology stack
q Java / Python / Ruby
q Solr / Lucene
q Cassandra / Couchbase
q Hadoop / Hive / Pig
q Redis / Kafka
8. Presentation Overview
8
§ Rakuten Group
§ Recommendations Challenges
• Challenges of Recommendations @ Rakuten
• Items Catalogues and Similarities
• Exploring Recommendations Models
• Recommendations Evaluation and Public Initiatives
§ Conclusion
9. Recommendations on Rakuten Marketplaces
9
9
Non-personalized recommendations
q All-shop recommendations:
q Item to item
q User to item
q In-shop recommendations
q Review-based recommendations
Personalized recommendations
q Purchase history recommendations
q Cart add recommendations
q Order confirmation recommendations
System status and scale
q In production in over 35 services of Rakuten Group worldwide
q Several hundreds of servers running:
q Hadoop
q Cassandra
q APIS
12. Presentation Overview
12
§ Rakuten Group
§ Recommendations Challenges
• Challenges of Recommendations @ Rakuten
• Items Catalogues and Similarities
• Exploring Recommendations Models
• Recommendations Evaluation and Public Initiatives
§ Conclusion
13. Items Catalogues
13
13
Use different levels of aggregation to improve recommendations
Category-level
(e.g. food, soda, clothes, …)
Product-level
(manufactured items)
Item in shop-level
(specific product sell by a
specific shop)
Increased statistical power
in co-events computation
Easier business handling
(picking the good item)
14. Enriching Catalogues using Record Linkage
14
Record linkage
q Use external sources (e.g., Wikidata) to
align markets' products
q Fuzzy matching of 600K vs 350K items
for movies alignments usecase.
q Blocking algorithm
Cross recommendation
q Global catalog
q Items aggregation
q Helps with cold start issues
q Improved navigation
Marketplace
2
Marketplace
1
Reference
database
15. Co-occurrences and Similarities Computation
15
Multiple possible parameters:
l Size of time window to be considered:
Does browsing and purchase data reflect similar behavior ?
l Threshold on co-occurrences
Is one co-occurrence significant enough to be used ? Two ? Three ?
l Symmetric or asymmetric
Is the order important in the co-occurrence ? A then B == B then A ?
l Similarity metrics
Which similarity metrics to be used based on the co-occurrences ?
Only access to unitary data (purchase / browsing)
Use co-occurrences for computing items similarity
17. Co-occurrences Computation
17
Co-‐purchases
Co-‐browsing
Classical co-occurrences
Complementary
items
Subs9tute
items
Other possible co-occurrences
Items
browsed
and
bought
together
Items
browsed
and
not
bought
together
“You
may
also
want…”
“Similar
items…”
08/11/2015
08/11/2015
08/11/2015
07/11/2015
08/11/2015
10/09/2015
08/09/2015
07/11/2015
18. Presentation Overview
18
§ Rakuten Group
§ Recommendations Challenges
• Challenges of Recommendations @ Rakuten
• Items Catalogues and Similarities
• Exploring Recommendations Models
• Recommendations Evaluation and Public Initiatives
§ Conclusion
19. Recommendations Algebra
19
Keys ideas
l Reuse already existing logics and combine them easily.
l Write business logic, not code !
l Handle multiple input/output formats.
Algebra for defining and combining recommendations engines
19
Available Logics
q Content-based
q Collaborative-filtering
q Item-item
q User-item (personalization)
Available Backends
q In-memory
q HDF5 files
q Cassandra
q Couchbase
Available Hybridization
q Linear algebra / weighting
q Mixed
q Cascade engines
q Meta-level
21. Python Algebra with Personalization
21
>>> history = HistoryEngine(datatype=‘purchase’, time_window=180, time_decay=0.01)!
>>> engine1.register_history_engine(history)!
…same code as previously (user-to-item)!
!
>>> recos = composite_engine.recommendations_by_user(‘userid’)!
21
Purchase-based
Top-20
Asymmetric
Conditional probability
Browsing-based
Similarity > 0.01
Symmetric
Cosine similarity
+
0.2
Composite engine
Purchase-history
Time window 180 days
Time decay 0.01
22. Python Algebra – Complete Example
22
22
Purchase-based
Top-20
Asymmetric
Conditional probability
Browsing-based
Similarity > 0.01
Symmetric
Cosine similarity
+
0.2
Composite engine
Purchase-history
Time window 180 days
Time decay 0.01
X
(cascade)
Purchase-based
Category-level
Similarity > 0.01
Asymmetric
Conditional probability
Browsing-based
Category-level
Similarity > 0.1
Symmetric
Cosine similarity
+
0.1
Composite engine
23. Presentation Overview
23
§ Rakuten Group
§ Recommendations Challenges
• Challenges of Recommendations @ Rakuten
• Items Catalogues and Similarities
• Exploring Recommendations Models
• Recommendations Evaluation and Public
Initiatives
§ Conclusion
24. Recommendation Quality Challenges
24
Minor
Product
Major
Product
(Popular)
New
Product
Old
Product
(A)
(B)
(D)
(C)
Recommendations categories
• Cold start issue
• External data ?
• Cross-services ?
• Hot products (A)
• Top-N items ?
• Short tail (B)
• Long tail (C + D)
25. Long Tail is Fat
25
Long tail numbers
• Most of the items are long tail
• They still represent a large portion of the
traffic
Popular
Short
tail
Long
tail
Browsing
share
Number
of
items
Long
tail
Short
tail
Popular
Long tail approaches
• Content-based
• Aggregation / clustering
• Personalization
26. Evaluation
26
Browsing
History
Query
History
Purchase
History
Algorithms
Datasets
Offline Test
Long-term Research
Online Test
KPI Maximization
Use as prior
Correlation between
offline metrics & value
Hybrid approach
q Offline for Long-Term and Prior
q Online for Short-Term and Maximizing KPI’s
27. Offline Evaluation
27
Pros/Cons
• Convenient way to try new ideas
• Fast and cheap
• But hard to align with online KPI
Approaches
• Rescoring
• Prediction game
• Business simulator
Target
=
item
bought
by
user
29. Public Initiative – Viki Recommendation Challenge
567 submissions from 132 participants
http://www.dextra.sg/challenges/rakuten-viki-video-challenge
29
30. Presentation Overview
30
§ Rakuten Group
§ Recommendations Challenges
• Challenges of Recommendations @ Rakuten
• Items Catalogues and Similarities
• Exploring Recommendations Models
• Recommendations Evaluation and Public Initiatives
§ Conclusion
31. Conclusion
31
Items catalogue: reinforce statistical power of co-occurrences across shops and
services;
Items similarities: find the good parameters for the different use-cases;
Recommendations models: what is the best models for in-shop, all-shops,
personalization?
Evaluation: handling long-tail? Comparing different models?
Rakuten provides marketplaces worldwide
Specific challenges for recommendations
32. We are Hiring!
32
Data Scientist / Software Developer
Ø Build algorithms for recommendations, search, targeting
Ø Predictive modeling, machine learning, natural language processing
Ø Working close to business
Ø Python, Java, Hadoop, Couchbase, Cassandra…
Ø Also hiring: search engine developers, big data system administrators, etc.
Big Data Department – team in Paris
http://global.rakuten.com/corp/careers/bigdata/
http://www.priceminister.com/recrutement/?p=197
33. 33
THANKS !
Questions ?
More on Rakuten tech initiatives
http://www.slideshare.net/rakutentech
http://rit.rakuten.co.jp/oss.html
http://rit.rakuten.co.jp/opendata.html
Positions
• http://global.rakuten.com/corp/careers/bigdata/
• http://www.priceminister.com/recrutement/?p=197