2. 2
平手勇宇
Principal Scientist, Rakuten Institute of Technology
Manager, Intelligence domain research group
Bio.
• 2005-2008 CS div. graduate school of Science and Engineering,
Waseda University.
• 2006-2009 Research Associate, Media Network Center, Waseda
University.
• 2009- current Rakuten Institute of Technology.
Working on projects for extracting knowledge from large scale of data
by utilizing data mining, machine learning technologies.
3. 3
Masaya Mori
Global head
• Established in 2006.
• Launched R.I.T. NY in 2010.
• Launched R.I.T. Paris in 2014.
• Launched R.I.T. Singapore / Boston in 2015.
Strategic R&D organization for Rakuten Group
5. 5
3 research groups for adapting with Internet growth
RealityIntelligencePower
• HCI
• AR / VR
• Image Processing
• Distributed Computing
• HPC
• IoT
• Machine Learning
• Deep Learning
• NLP
• Data Mining
6. 6
Optimizing A/B testing
Item Classification
User Segmentation
AI
Coupon Distribution
Recommender System Economy Prediction / Demand Prediction
Review Analysis
Anomaly Detection / Fraud Detection
Image Recognition
9. 9
Why are we working on this problem? (Key Benefits)
‣ To organize our catalog in accordance with customer
expectations
‣ To precisely search our catalog for products and its variants
‣ To measure and enforce merchant KPI's.
What are we doing? (Key Tasks)
‣ Product Genre Classification
‣ Attribute Extraction from Product Information
‣ Merchant and Item Review Analysis
How are we doing? (Key Technologies)
‣ Large-Scale Gradient Boosted Decision Trees
‣ Deep Learning (RNN's, CNN's, others)
‣ Computing Massive Number of NLP Features
Product Catalog
Businesses
10. 10
Each product can be assigned a category and attributes. For instance:
+Category Grocery & food
Subcategory Wine
Each (sub)category has a number of relevant attributes with a list of valid values
Challenge: this structured information is not always present or correct
Goal: automatically predict category and attributes from text and/or images
https://item.rakuten.co.jp/kawahara/345812/
11. 11
Classifier based on
Deep Learning Algorithm (CNN)
Prec@1 92%
Prec@10 99%
Classifier based on
Deep Learning Algorithm(CNN)
Prec@1 57%
Prec@3 75%
Extracting Words
* Tested to Ichiba L3 category (1.5K categories)
* Tested for PriceMinister Image Data
Text Data
• Item Title
• Item Description
Image Data
12. 12
Hobby and Entertainment
> Books and Magazine
> Business Electronics
> Audio
> Earphone / Headphone
Electronics
> Smartphone
> AC Adaptor / Battery
15. 15
Detect prospective applicants from Ichiba purchasers
by using their purchase trends and demographics
Ichiba Active
Users Prospective
Applicants
Extract a finance service
18. 18
Prospective Users Control Group
• Randomly Selected
• About 300,000 users
• Score >= 0.8
• About 300,000 users
Send ichiba mail magazine to two groups
Ichiba Mail Magazine
19. 19
Mail Deliver
Open Mail
Click Contents
(Visit Service
Page)
Click Rate went up by +49.23%
compared with control group
+3.52% +49.23%
22. 22
Translate from Chinese to English sentences
Extracted 10,000 Chinese-English sentence-pairs to
evaluate commercial APIs and IBot, e.g.,
我一个老总都亲自跑了好几趟了
I’m a director and yet I’ve made so many trips
Extracted another 2.1 million sentence-pairs to train
IBot’s model
23. 23
Applying Attentional Recurrent Neural Networks
(RNN)
Neural Machine Translation by Jointly Learning to
Align and Translate [Bahdanau, Cho & Bengio, ICLR 2015]
658 citations (Google scholar)
Train RNN with 2.1 million c
Chinese-English sentence
pairs
24. 24
Evaluated on 10,000 Chinese-English sentence pairs
System BLEU (%) METEOR (%)
Google API 12 20
Microsoft API 12 20
IBM Watson API 3 12
RIT (Aug 24) 10 15
RIT (Sep 7) 14 19
RIT (Sep 21) 22 24
RIT (Nov 28) 36 30