SlideShare une entreprise Scribd logo
1  sur  69
Télécharger pour lire hors ligne
Oct.28.2017
Ewa Szymanska, PhD
Head of Rakuten Institute of Technology Singapore
2
Source: https://unsplash.com/ by Element5 Digital
3
I am watching shows in Chinese to get used
to ‘actual’ spoken Mandarin, and not just
what I see in my textbooks
“
” VIKI user
4
* Images from Rakuten VIKI, Rakuten TV
5
1.8 billion people are learning foreign languages
Source: The Washington Post: https://www.washingtonpost.com/news/worldviews/wp/2015/04/23/the-worlds-languages-in-7-maps-and-charts
Languages with most
native speakers
Most commonly studied
foreign languages
6
Online individual language learning market is growing at 12% CAGR
Source: Rosetta Stone Investor Day 2017
7
I. Entertaining Content II. Global Users III. Technology
*Photo by Jakob Owens on Unsplash
8
Interactive
subtitles
Video
dictionary Quizzes1 2
3
* Images from Rakuten VIKI
9
Interactive subtitles1
Fast adoption
30,000 DAU
– daily active users
High engagement
Korean Learn Mode
users view 10% more
than Viki average
High satisfaction
83 NPS
– net promoter score
*cnet.com @ CBS Interactive Inc. Apr 13, 2017; Keia.org, Korean Economic Institute, Apr 2017; Forbes Oct 24, 2017; The Verge, Sep 28, 2017
Shows availability
“Daughter
Back”
“Return of
Happiness”
“Ice and Fire
of Youth”
“My Love
from the Star”
“Boys Over
Flowers”
“Descendants
of the Sun”
Learn Chinese (Japan) Learn Korean (USA)
* Images from Rakuten VIKI
[ Learn Mode collection on viki.com ]
11
• 60,000+ quizzes taken
• 35,000+ users completed the quiz
• Very positive social media engagement:
2 Drama Vocab Quiz [ languagequiz.viki.com ]
12
3 Video-based Dictionary
Integrate with the classroom curriculum:
13
“ If you talk to a man
in a language he understands,
that goes to his head.
If you talk to him in his language,
that goes to his heart. ”
- Nelson Mandela
14
Oct 28, 2017
Stanley Kok
Principal Research Scientist
Rakuten Institute of Technology (Singapore)
you
16
你 是 辣妹 , 也是 名门贵 族
你是辣妹,也是名门贵族
你 是 辣妹 , 也是 名门贵族
are (a) hot chick and also (of) the gentry
Splitting a sentence into pieces, each preserving
its original semantics
you are (a) hot chick and also tribe
17
努力的人才会成功
努力 的 人 才 会 成功
only hardworking people will succeed
努力 的 人才 会 成功
hardworking talent will succeed
18
Tokenization
19
Dictionary
Lookup
20
Many open-source tokenizers available
Good, but not perfect
Different mistakes
Why not use more (or all) of them to improve
tokenization?
 Strengths of one tokenizer overcomes
shortcomings of another
21
How to quantify “goodness” of tokenization?
Take human learner’s perspective
#Dictionary look-ups needed to understand all tokens
Non-existent tokens assumed to need large #lookups (10)
你 是 辣妹 你 是 辣 妹 你 是辣 妹
hot
chick
areyou
younger
sister
spicy
areyou younger
sister
?you
1 + 1 + 1 = 3
1 + 1 + 1 + 1 = 4
1 + 10 + 1 = 12
22
Can do better than picking lowest cost
tokenization from tokenizers
Treat common tokens as “anchor points”
Pick best tokens from remaining ones
23
你 是 辣妹 也是 名门贵 族
你 是辣 妹 也是 名门贵族
你 是 辣妹 也是 名门贵族
you are hot chick
and also tribe
you
younger
sister
and also (of) the gentry
(15)
(14)
(5)
24
Dictionaries are important for language learning
Manual approach provides high-quality dictionary,
but not scalable
About 7000 languages in the world
About 49 million bilingual dictionaries
Thus need automatic approach
25
Lots of online dictionaries available
Could we automatically learn new dictionaries
from them?
Focus on Chinese-English (C-E) & Korean-
English (K-E) bilingual dictionaries
26
Lots of dictionaries online
Some are C-E and K-E, but many are not
Many dictionaries are C-X and X-E
Use language X as bridge/pivot
C-X + X-E => C-E, e.g.,
辣妹->fille sexy + fille sexy ->hot chick
=> 辣妹-> hot chick
27
Take 2 hops for now
Chinese-English dictionary has 750K entries
90% correct
Korean-English dictionary has 100K entries
99% correct
28
Learn bilingual dictionary using
Using seed lexicon
Monolingual data (plentiful)
Maps bi-lingual phrases to vector space
dolphin
海豚
东京Tokyo
Sushi
寿司
29
30
31
Artifact of standard machine translation pipeline
Parallel sentences aligned word for word
Compute probability of mapping tokens of a
source language to those of a target language
A correct source token will be more
consistently aligned to its corresponding
target token(s)
Add high-probability mappings to dictionary
32
Chinese English P(C|E) P(E|C) AveProb
辣妹 hot chick 0.8 0.9 0.85
是辣 is curry 0.1 0.1 0.1
33
Chinese-English Dictionary
3 million Chinese tokens (Jan’17)
89% in dictionary
Korean-English Dictionary
4 million Korean tokens (Jan’17)
86% in dictionary
34
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
500000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
#KoreanTokens vs. #Defintions
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
#ChineseTokens vs. #Definitions
35
Match parallel sentences to
Phrase table
Dictionary
36
他 放弃 梦想
He gave up his dreams
Chinese English AveProb
放弃 gave up his 0.74
放弃 quit, 0.83
放弃 abdicate 0.68
Phrase Table
37
他 放弃 梦想
He gave up his dreams
Chinese English AveProb
放弃 gave up his 0.74
放弃 quit 0.83
放弃 abdicate 0.68
Phrase Table
Best Match
他 放弃 梦想
He gave up his dreams
best match
38
Chinese English AveProb
放弃 gave up his 0.74
放弃 quit 0.83
放弃 abdicate 0.68
Phrase Table
best match
Chinese English
放弃 abandon
放弃 give up
放弃 abdicate
Dictionary
Drama Vocabulary Quiz
Liling Tan
Rakuten Institute of Technology (Singapore)
28 Oct 2017 @ Rakuten Tech. Conference
40
Overview
•Introduction
•Demo
•How did We Create the Quiz?
41
Introduction
•Quizzes are fun and could be viral
•But manually creating quizzes is tedious
•We created #DramaVocabQuiz that generates new
vocabulary quizzes automatically
42
43
44
45
46
47
48
How do we Generate
Quizzes
Automatically?
49
Korean Drama Word List
• The word 미남 [minam] “handsome guy” can be followed by multiple suffixes at once -이시라
구요 [-issilaguyo] to form a single word meaning “someone said that he is handsome”.
• We only extract the root word 미남 [minam], and count it as a unique word type
50
Korean Drama Word List
51
Korean Drama Word List
52
Korean Drama Word List
53
Splitting Word List into
3 Difficulty Levels
↑
54
Generate the Distractors
• Distractor 1: Select the top 5th to 20th closest words (cosine)
• Distractor 2: Use Distractor 1 as negative and question word as
positive, select 1st to 20th closest word (cosmul)
References:
• Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In ICLR.
• Omer Levy and Yoav Goldberg. 2014. Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL.
55
Language Leaners Like Quizzes!!
• 60,000+ quizzes taken
• 35,000+ unique users completed quiz
• 16% of the users repeated quiz
56
Word Frequency is a Good Indicator of Difficulty
10
8
6
4
2
0
Easy Medium Hard
Easy = Frequent words
Medium = Less Frequent
words
Hard = Least Frequent
words
57
Conclusion
Watch Drama,
Learn Language
Quiz: https://languagequiz.viki.com
Techblog: https://techblog.rakuten.co.jp/2017/05/26/lang-quiz/
Oct.28.2017
Pang Zineng
Senior Technologist
Rakuten Institute of Technology Singapore
59
* Images from Rakuten VIKI
60
clips
pages
Web Search In-Video Search
* Images from Rakuten VIKI
61
Web Search In-Video Search
•The meta data of the site
•The meta data of the page
•The word tokens in the page
•The topic of the page
•The originality of the page
•Hyperlinks (page rank)
• The meta data of the video
•The meta data of this clip
(timestamp, length, URI, etc.)
• The caption text of the clip
• The frames & audio signal
•Complexity of the sentence
•Diversity of the clips
site
identifier
page
identifier
content
ranking
search
relevancy
video
identifier
clip
identifier
search
relevancy
content
ranking
* Images from Rakuten VIKI
62
Job:
• Make some data ready for consumption.
Questions:
• How does the data come?
• What needs to be done for it to be ready?
• How will the data be consumed?
database
Pre-
processing
function
Trigger /
monitor
function
Raw
Data
Data access
function
FTP API
Data provider
Data consumer
63
Job:
• Let outsider use a function.
Questions:
• How frequently will the function be used?
• What data does the function need?
Application
logic
API
Endpoint
Web Application
API Cache
Request
Queue
Application
Cache
Internal/External Data
64
Rakuten TV
video contents
Other
video contents
Rakuten VIKI
video contents
Search
function
3rd Party Platform
Motion Dictionary
* Images from Rakuten VIKI
65
Japanese
Dictionary
Data
dictionary
function
voice
function
3rd party
solution
Korean
Dictionary
Data
Chinese
Dictionary
Data
3rd party
solution
open source
framework
Interactive Subtitles
(version 2)
Interactive Subtitles
(version 3)
* Images from Rakuten VIKI
tokenization
function
Korean
Tokenization
Data
Chinese
Tokenization
Data
Japanese
Tokenization
Data
open source
framework
open source
framework
open source
framework
Korean
Tokenization
Data
Chinese
Tokenization
Data
In-house
solution
In-house
solution
66
Japanese
Dictionary
Data
dictionary
function
voice
function
3rd party
solution
Korean
Dictionary
Data
Chinese
Dictionary
Data
3rd party
solution
open source
framework
Interactive Subtitles
(version 2)
Interactive Subtitles
(version 3)
* Images from Rakuten VIKI
tokenization
function
Japanese
Tokenization
Data
open source
framework
Global
Tokenization
Data
In-house
solution
Global
Dictionary
Data
In-house
solution
Korean
Tokenization
Data
Chinese
Tokenization
Data
In-house
solution
In-house
solution
67
Take
Quiz
function
Vocab Quiz
(version 1)
* Images from Rakuten VIKI
Chinese
Quiz Data
Korean
Quiz Data
68
Chinese
Quiz Data
Take
Quiz
function
voice
function
Vocab Quiz
(version 2)
* Images from Rakuten VIKI
Korean
Quiz Data
69
Fast iteration in R&D won’t be possible
if we had many things bundled or coupled.
-- Pang
Vocab Quiz
• https://languagequiz.viki.com/
Learn Mode (PC/Mac only)
• https://www.viki.com/collections/316981l-learn-the-basics-chinese
• https://www.viki.com/collections/316939l-learn-the-basics-korean
Motion Dictionary
• TBD

Contenu connexe

En vedette

Challenge for statup's cto from big company nagaaki hoshi
Challenge for statup's cto from big company nagaaki hoshiChallenge for statup's cto from big company nagaaki hoshi
Challenge for statup's cto from big company nagaaki hoshiRakuten Group, Inc.
 
Rakutenとsreと私 yanagimoto koichi
Rakutenとsreと私 yanagimoto koichiRakutenとsreと私 yanagimoto koichi
Rakutenとsreと私 yanagimoto koichiRakuten Group, Inc.
 
AI AND FUNDAMENTAL GAME TECHNOLOGIESIN FINAL FANTASY XV
AI AND FUNDAMENTAL GAME TECHNOLOGIESIN FINAL FANTASY XVAI AND FUNDAMENTAL GAME TECHNOLOGIESIN FINAL FANTASY XV
AI AND FUNDAMENTAL GAME TECHNOLOGIESIN FINAL FANTASY XVRakuten Group, Inc.
 
はてなのインフラの歴史、そしてMackerelへ至る道とこれから
はてなのインフラの歴史、そしてMackerelへ至る道とこれから はてなのインフラの歴史、そしてMackerelへ至る道とこれから
はてなのインフラの歴史、そしてMackerelへ至る道とこれから Rakuten Group, Inc.
 
Value Delivery through RakutenBig Data Intelligence Ecosystem and Technology
Value Delivery through RakutenBig Data Intelligence Ecosystem  and  TechnologyValue Delivery through RakutenBig Data Intelligence Ecosystem  and  Technology
Value Delivery through RakutenBig Data Intelligence Ecosystem and TechnologyRakuten Group, Inc.
 
WannaEat: A computer vision-based, multi-platform restaurant lookup app
WannaEat: A computer vision-based, multi-platform restaurant lookup appWannaEat: A computer vision-based, multi-platform restaurant lookup app
WannaEat: A computer vision-based, multi-platform restaurant lookup appRakuten Group, Inc.
 
時間がないといって、オペレーション改善を怠るな~オペレーション改善奮闘記~ Emi muroya
時間がないといって、オペレーション改善を怠るな~オペレーション改善奮闘記~ Emi muroya時間がないといって、オペレーション改善を怠るな~オペレーション改善奮闘記~ Emi muroya
時間がないといって、オペレーション改善を怠るな~オペレーション改善奮闘記~ Emi muroyaRakuten Group, Inc.
 
Rakuten app productivity initiative for developers marcus saw
Rakuten app productivity initiative for developers marcus sawRakuten app productivity initiative for developers marcus saw
Rakuten app productivity initiative for developers marcus sawRakuten Group, Inc.
 
Rakuten Technology Conference 2017 A Distributed SQL Database For Data Analy...
Rakuten Technology Conference 2017 A Distributed SQL Database  For Data Analy...Rakuten Technology Conference 2017 A Distributed SQL Database  For Data Analy...
Rakuten Technology Conference 2017 A Distributed SQL Database For Data Analy...Rakuten Group, Inc.
 
Java ee7 with apache spark for the world's largest credit card core systems, ...
Java ee7 with apache spark for the world's largest credit card core systems, ...Java ee7 with apache spark for the world's largest credit card core systems, ...
Java ee7 with apache spark for the world's largest credit card core systems, ...Rakuten Group, Inc.
 
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platformcloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data PlatformRakuten Group, Inc.
 
Change the engineer life by batch system renewal
Change the engineer life by batch system renewalChange the engineer life by batch system renewal
Change the engineer life by batch system renewalRakuten Group, Inc.
 
Building your own static site Using Hugo
Building your own static site Using HugoBuilding your own static site Using Hugo
Building your own static site Using HugoRakuten Group, Inc.
 
RTC 2017 - The Power of Parallelism
RTC 2017 - The Power of ParallelismRTC 2017 - The Power of Parallelism
RTC 2017 - The Power of ParallelismRakuten Group, Inc.
 
Artificial Intelligence for Happiness of People
Artificial Intelligence for Happiness of PeopleArtificial Intelligence for Happiness of People
Artificial Intelligence for Happiness of PeopleRakuten Group, Inc.
 

En vedette (20)

Challenge for statup's cto from big company nagaaki hoshi
Challenge for statup's cto from big company nagaaki hoshiChallenge for statup's cto from big company nagaaki hoshi
Challenge for statup's cto from big company nagaaki hoshi
 
Don't manage too hard!
Don't manage too hard! Don't manage too hard!
Don't manage too hard!
 
Human-Centric Machine Learning
Human-Centric Machine LearningHuman-Centric Machine Learning
Human-Centric Machine Learning
 
Rakutenとsreと私 yanagimoto koichi
Rakutenとsreと私 yanagimoto koichiRakutenとsreと私 yanagimoto koichi
Rakutenとsreと私 yanagimoto koichi
 
AI AND FUNDAMENTAL GAME TECHNOLOGIESIN FINAL FANTASY XV
AI AND FUNDAMENTAL GAME TECHNOLOGIESIN FINAL FANTASY XVAI AND FUNDAMENTAL GAME TECHNOLOGIESIN FINAL FANTASY XV
AI AND FUNDAMENTAL GAME TECHNOLOGIESIN FINAL FANTASY XV
 
はてなのインフラの歴史、そしてMackerelへ至る道とこれから
はてなのインフラの歴史、そしてMackerelへ至る道とこれから はてなのインフラの歴史、そしてMackerelへ至る道とこれから
はてなのインフラの歴史、そしてMackerelへ至る道とこれから
 
Value Delivery through RakutenBig Data Intelligence Ecosystem and Technology
Value Delivery through RakutenBig Data Intelligence Ecosystem  and  TechnologyValue Delivery through RakutenBig Data Intelligence Ecosystem  and  Technology
Value Delivery through RakutenBig Data Intelligence Ecosystem and Technology
 
WannaEat: A computer vision-based, multi-platform restaurant lookup app
WannaEat: A computer vision-based, multi-platform restaurant lookup appWannaEat: A computer vision-based, multi-platform restaurant lookup app
WannaEat: A computer vision-based, multi-platform restaurant lookup app
 
COBOL to Apache Spark
COBOL to Apache SparkCOBOL to Apache Spark
COBOL to Apache Spark
 
時間がないといって、オペレーション改善を怠るな~オペレーション改善奮闘記~ Emi muroya
時間がないといって、オペレーション改善を怠るな~オペレーション改善奮闘記~ Emi muroya時間がないといって、オペレーション改善を怠るな~オペレーション改善奮闘記~ Emi muroya
時間がないといって、オペレーション改善を怠るな~オペレーション改善奮闘記~ Emi muroya
 
Rakuten app productivity initiative for developers marcus saw
Rakuten app productivity initiative for developers marcus sawRakuten app productivity initiative for developers marcus saw
Rakuten app productivity initiative for developers marcus saw
 
Rakuten Technology Conference 2017 A Distributed SQL Database For Data Analy...
Rakuten Technology Conference 2017 A Distributed SQL Database  For Data Analy...Rakuten Technology Conference 2017 A Distributed SQL Database  For Data Analy...
Rakuten Technology Conference 2017 A Distributed SQL Database For Data Analy...
 
Java ee7 with apache spark for the world's largest credit card core systems, ...
Java ee7 with apache spark for the world's largest credit card core systems, ...Java ee7 with apache spark for the world's largest credit card core systems, ...
Java ee7 with apache spark for the world's largest credit card core systems, ...
 
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platformcloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
 
Change the engineer life by batch system renewal
Change the engineer life by batch system renewalChange the engineer life by batch system renewal
Change the engineer life by batch system renewal
 
Building your own static site Using Hugo
Building your own static site Using HugoBuilding your own static site Using Hugo
Building your own static site Using Hugo
 
Realizing AI Conversational Bot
Realizing AI Conversational BotRealizing AI Conversational Bot
Realizing AI Conversational Bot
 
RTC 2017 - The Power of Parallelism
RTC 2017 - The Power of ParallelismRTC 2017 - The Power of Parallelism
RTC 2017 - The Power of Parallelism
 
Riemannian Geometry in Egison
Riemannian Geometry in EgisonRiemannian Geometry in Egison
Riemannian Geometry in Egison
 
Artificial Intelligence for Happiness of People
Artificial Intelligence for Happiness of PeopleArtificial Intelligence for Happiness of People
Artificial Intelligence for Happiness of People
 

Similaire à Enable Fast Iteration in R&D- Use modular, loosely coupled architectures so changes don't have widespread impact- Automate testing and deployments to streamline the development cycle - Implement continuous integration/delivery to get feedback quickly- Empower cross-functional teams with autonomy over their work- Adopt agile methodologies like Scrum, Kanban to support experimentation- Colocate teams physically to facilitate collaboration and rapid problem-solving- Leverage cloud infrastructure for flexible, on-demand compute resources- Invest in tools that enhance developer productivity like IDEs, version control etc.- Foster a culture

Shut Up! No one is listening! Web 2.0 and Mobile Media Are Speaking.
Shut Up! No one is listening! Web 2.0 and Mobile Media Are Speaking.Shut Up! No one is listening! Web 2.0 and Mobile Media Are Speaking.
Shut Up! No one is listening! Web 2.0 and Mobile Media Are Speaking.Courtney Teague
 
Intelligent Chatbot on WeChat
Intelligent Chatbot on WeChatIntelligent Chatbot on WeChat
Intelligent Chatbot on WeChatAI Frontiers
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsForward Gradient
 
Evaluation of online learning
Evaluation of online learningEvaluation of online learning
Evaluation of online learningshatha al abeer
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisFabio Benedetti
 
[Rakuten TechConf2014] [D-2] The Pattern-Matching-Oriented Programming Langua...
[Rakuten TechConf2014] [D-2] The Pattern-Matching-Oriented Programming Langua...[Rakuten TechConf2014] [D-2] The Pattern-Matching-Oriented Programming Langua...
[Rakuten TechConf2014] [D-2] The Pattern-Matching-Oriented Programming Langua...Rakuten Group, Inc.
 
Nlp and Neural Networks workshop
Nlp and Neural Networks workshopNlp and Neural Networks workshop
Nlp and Neural Networks workshopQuantUniversity
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialAlyona Medelyan
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingIla Group
 
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...Data Science Milan
 
16-nlp (2).ppt
16-nlp (2).ppt16-nlp (2).ppt
16-nlp (2).ppttestbest6
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
 
UX STRAT Europe 2019: Zhaochang He, VMware
UX STRAT Europe 2019: Zhaochang He, VMwareUX STRAT Europe 2019: Zhaochang He, VMware
UX STRAT Europe 2019: Zhaochang He, VMwareUX STRAT
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemMark Cieliebak
 
Innovations in AI-Powered Assessments and Feedback
Innovations in AI-Powered Assessments and FeedbackInnovations in AI-Powered Assessments and Feedback
Innovations in AI-Powered Assessments and Feedbackorrenprunckun
 
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...Preetha Chatterjee
 

Similaire à Enable Fast Iteration in R&D- Use modular, loosely coupled architectures so changes don't have widespread impact- Automate testing and deployments to streamline the development cycle - Implement continuous integration/delivery to get feedback quickly- Empower cross-functional teams with autonomy over their work- Adopt agile methodologies like Scrum, Kanban to support experimentation- Colocate teams physically to facilitate collaboration and rapid problem-solving- Leverage cloud infrastructure for flexible, on-demand compute resources- Invest in tools that enhance developer productivity like IDEs, version control etc.- Foster a culture (20)

Shut Up! No one is listening! Web 2.0 and Mobile Media Are Speaking.
Shut Up! No one is listening! Web 2.0 and Mobile Media Are Speaking.Shut Up! No one is listening! Web 2.0 and Mobile Media Are Speaking.
Shut Up! No one is listening! Web 2.0 and Mobile Media Are Speaking.
 
Intelligent Chatbot on WeChat
Intelligent Chatbot on WeChatIntelligent Chatbot on WeChat
Intelligent Chatbot on WeChat
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
 
Evaluation of online learning
Evaluation of online learningEvaluation of online learning
Evaluation of online learning
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
 
[Rakuten TechConf2014] [D-2] The Pattern-Matching-Oriented Programming Langua...
[Rakuten TechConf2014] [D-2] The Pattern-Matching-Oriented Programming Langua...[Rakuten TechConf2014] [D-2] The Pattern-Matching-Oriented Programming Langua...
[Rakuten TechConf2014] [D-2] The Pattern-Matching-Oriented Programming Langua...
 
Nlp and Neural Networks workshop
Nlp and Neural Networks workshopNlp and Neural Networks workshop
Nlp and Neural Networks workshop
 
Let's pretend
Let's pretendLet's pretend
Let's pretend
 
KiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorialKiwiPyCon 2014 - NLP with Python tutorial
KiwiPyCon 2014 - NLP with Python tutorial
 
The NLP Muppets revolution!
The NLP Muppets revolution!The NLP Muppets revolution!
The NLP Muppets revolution!
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
 
16-nlp (2).ppt
16-nlp (2).ppt16-nlp (2).ppt
16-nlp (2).ppt
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
UX STRAT Europe 2019: Zhaochang He, VMware
UX STRAT Europe 2019: Zhaochang He, VMwareUX STRAT Europe 2019: Zhaochang He, VMware
UX STRAT Europe 2019: Zhaochang He, VMware
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis Problem
 
Python dictionaries
Python dictionariesPython dictionaries
Python dictionaries
 
Innovations in AI-Powered Assessments and Feedback
Innovations in AI-Powered Assessments and FeedbackInnovations in AI-Powered Assessments and Feedback
Innovations in AI-Powered Assessments and Feedback
 
1004-nlp.ppt
1004-nlp.ppt1004-nlp.ppt
1004-nlp.ppt
 
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
 

Plus de Rakuten Group, Inc.

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話Rakuten Group, Inc.
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のりRakuten Group, Inc.
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Rakuten Group, Inc.
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みRakuten Group, Inc.
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開Rakuten Group, Inc.
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用Rakuten Group, Inc.
 
楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャーRakuten Group, Inc.
 
楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割Rakuten Group, Inc.
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Group, Inc.
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfRakuten Group, Inc.
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfRakuten Group, Inc.
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfRakuten Group, Inc.
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfRakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technologyRakuten Group, Inc.
 
100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情Rakuten Group, Inc.
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャーRakuten Group, Inc.
 

Plus de Rakuten Group, Inc. (20)

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり
 
What Makes Software Green?
What Makes Software Green?What Makes Software Green?
What Makes Software Green?
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組み
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用
 
楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー
 
楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdf
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdf
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdf
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdf
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdf
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
OWASPTop10_Introduction
OWASPTop10_IntroductionOWASPTop10_Introduction
OWASPTop10_Introduction
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technology
 
100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー
 

Dernier

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Dernier (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Enable Fast Iteration in R&D- Use modular, loosely coupled architectures so changes don't have widespread impact- Automate testing and deployments to streamline the development cycle - Implement continuous integration/delivery to get feedback quickly- Empower cross-functional teams with autonomy over their work- Adopt agile methodologies like Scrum, Kanban to support experimentation- Colocate teams physically to facilitate collaboration and rapid problem-solving- Leverage cloud infrastructure for flexible, on-demand compute resources- Invest in tools that enhance developer productivity like IDEs, version control etc.- Foster a culture

  • 1. Oct.28.2017 Ewa Szymanska, PhD Head of Rakuten Institute of Technology Singapore
  • 3. 3 I am watching shows in Chinese to get used to ‘actual’ spoken Mandarin, and not just what I see in my textbooks “ ” VIKI user
  • 4. 4 * Images from Rakuten VIKI, Rakuten TV
  • 5. 5 1.8 billion people are learning foreign languages Source: The Washington Post: https://www.washingtonpost.com/news/worldviews/wp/2015/04/23/the-worlds-languages-in-7-maps-and-charts Languages with most native speakers Most commonly studied foreign languages
  • 6. 6 Online individual language learning market is growing at 12% CAGR Source: Rosetta Stone Investor Day 2017
  • 7. 7 I. Entertaining Content II. Global Users III. Technology *Photo by Jakob Owens on Unsplash
  • 9. 9 Interactive subtitles1 Fast adoption 30,000 DAU – daily active users High engagement Korean Learn Mode users view 10% more than Viki average High satisfaction 83 NPS – net promoter score *cnet.com @ CBS Interactive Inc. Apr 13, 2017; Keia.org, Korean Economic Institute, Apr 2017; Forbes Oct 24, 2017; The Verge, Sep 28, 2017
  • 10. Shows availability “Daughter Back” “Return of Happiness” “Ice and Fire of Youth” “My Love from the Star” “Boys Over Flowers” “Descendants of the Sun” Learn Chinese (Japan) Learn Korean (USA) * Images from Rakuten VIKI [ Learn Mode collection on viki.com ]
  • 11. 11 • 60,000+ quizzes taken • 35,000+ users completed the quiz • Very positive social media engagement: 2 Drama Vocab Quiz [ languagequiz.viki.com ]
  • 12. 12 3 Video-based Dictionary Integrate with the classroom curriculum:
  • 13. 13 “ If you talk to a man in a language he understands, that goes to his head. If you talk to him in his language, that goes to his heart. ” - Nelson Mandela
  • 14. 14
  • 15. Oct 28, 2017 Stanley Kok Principal Research Scientist Rakuten Institute of Technology (Singapore)
  • 16. you 16 你 是 辣妹 , 也是 名门贵 族 你是辣妹,也是名门贵族 你 是 辣妹 , 也是 名门贵族 are (a) hot chick and also (of) the gentry Splitting a sentence into pieces, each preserving its original semantics you are (a) hot chick and also tribe
  • 17. 17 努力的人才会成功 努力 的 人 才 会 成功 only hardworking people will succeed 努力 的 人才 会 成功 hardworking talent will succeed
  • 18. 18
  • 20. 20 Many open-source tokenizers available Good, but not perfect Different mistakes Why not use more (or all) of them to improve tokenization?  Strengths of one tokenizer overcomes shortcomings of another
  • 21. 21 How to quantify “goodness” of tokenization? Take human learner’s perspective #Dictionary look-ups needed to understand all tokens Non-existent tokens assumed to need large #lookups (10) 你 是 辣妹 你 是 辣 妹 你 是辣 妹 hot chick areyou younger sister spicy areyou younger sister ?you 1 + 1 + 1 = 3 1 + 1 + 1 + 1 = 4 1 + 10 + 1 = 12
  • 22. 22 Can do better than picking lowest cost tokenization from tokenizers Treat common tokens as “anchor points” Pick best tokens from remaining ones
  • 23. 23 你 是 辣妹 也是 名门贵 族 你 是辣 妹 也是 名门贵族 你 是 辣妹 也是 名门贵族 you are hot chick and also tribe you younger sister and also (of) the gentry (15) (14) (5)
  • 24. 24 Dictionaries are important for language learning Manual approach provides high-quality dictionary, but not scalable About 7000 languages in the world About 49 million bilingual dictionaries Thus need automatic approach
  • 25. 25 Lots of online dictionaries available Could we automatically learn new dictionaries from them? Focus on Chinese-English (C-E) & Korean- English (K-E) bilingual dictionaries
  • 26. 26 Lots of dictionaries online Some are C-E and K-E, but many are not Many dictionaries are C-X and X-E Use language X as bridge/pivot C-X + X-E => C-E, e.g., 辣妹->fille sexy + fille sexy ->hot chick => 辣妹-> hot chick
  • 27. 27 Take 2 hops for now Chinese-English dictionary has 750K entries 90% correct Korean-English dictionary has 100K entries 99% correct
  • 28. 28 Learn bilingual dictionary using Using seed lexicon Monolingual data (plentiful) Maps bi-lingual phrases to vector space dolphin 海豚 东京Tokyo Sushi 寿司
  • 29. 29
  • 30. 30
  • 31. 31 Artifact of standard machine translation pipeline Parallel sentences aligned word for word Compute probability of mapping tokens of a source language to those of a target language A correct source token will be more consistently aligned to its corresponding target token(s) Add high-probability mappings to dictionary
  • 32. 32 Chinese English P(C|E) P(E|C) AveProb 辣妹 hot chick 0.8 0.9 0.85 是辣 is curry 0.1 0.1 0.1
  • 33. 33 Chinese-English Dictionary 3 million Chinese tokens (Jan’17) 89% in dictionary Korean-English Dictionary 4 million Korean tokens (Jan’17) 86% in dictionary
  • 34. 34 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 #KoreanTokens vs. #Defintions 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 #ChineseTokens vs. #Definitions
  • 35. 35 Match parallel sentences to Phrase table Dictionary
  • 36. 36 他 放弃 梦想 He gave up his dreams Chinese English AveProb 放弃 gave up his 0.74 放弃 quit, 0.83 放弃 abdicate 0.68 Phrase Table
  • 37. 37 他 放弃 梦想 He gave up his dreams Chinese English AveProb 放弃 gave up his 0.74 放弃 quit 0.83 放弃 abdicate 0.68 Phrase Table Best Match
  • 38. 他 放弃 梦想 He gave up his dreams best match 38 Chinese English AveProb 放弃 gave up his 0.74 放弃 quit 0.83 放弃 abdicate 0.68 Phrase Table best match Chinese English 放弃 abandon 放弃 give up 放弃 abdicate Dictionary
  • 39. Drama Vocabulary Quiz Liling Tan Rakuten Institute of Technology (Singapore) 28 Oct 2017 @ Rakuten Tech. Conference
  • 41. 41 Introduction •Quizzes are fun and could be viral •But manually creating quizzes is tedious •We created #DramaVocabQuiz that generates new vocabulary quizzes automatically
  • 42. 42
  • 43. 43
  • 44. 44
  • 45. 45
  • 46. 46
  • 47. 47
  • 48. 48 How do we Generate Quizzes Automatically?
  • 49. 49 Korean Drama Word List • The word 미남 [minam] “handsome guy” can be followed by multiple suffixes at once -이시라 구요 [-issilaguyo] to form a single word meaning “someone said that he is handsome”. • We only extract the root word 미남 [minam], and count it as a unique word type
  • 53. 53 Splitting Word List into 3 Difficulty Levels ↑
  • 54. 54 Generate the Distractors • Distractor 1: Select the top 5th to 20th closest words (cosine) • Distractor 2: Use Distractor 1 as negative and question word as positive, select 1st to 20th closest word (cosmul) References: • Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In ICLR. • Omer Levy and Yoav Goldberg. 2014. Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL.
  • 55. 55 Language Leaners Like Quizzes!! • 60,000+ quizzes taken • 35,000+ unique users completed quiz • 16% of the users repeated quiz
  • 56. 56 Word Frequency is a Good Indicator of Difficulty 10 8 6 4 2 0 Easy Medium Hard Easy = Frequent words Medium = Less Frequent words Hard = Least Frequent words
  • 57. 57 Conclusion Watch Drama, Learn Language Quiz: https://languagequiz.viki.com Techblog: https://techblog.rakuten.co.jp/2017/05/26/lang-quiz/
  • 58. Oct.28.2017 Pang Zineng Senior Technologist Rakuten Institute of Technology Singapore
  • 59. 59 * Images from Rakuten VIKI
  • 60. 60 clips pages Web Search In-Video Search * Images from Rakuten VIKI
  • 61. 61 Web Search In-Video Search •The meta data of the site •The meta data of the page •The word tokens in the page •The topic of the page •The originality of the page •Hyperlinks (page rank) • The meta data of the video •The meta data of this clip (timestamp, length, URI, etc.) • The caption text of the clip • The frames & audio signal •Complexity of the sentence •Diversity of the clips site identifier page identifier content ranking search relevancy video identifier clip identifier search relevancy content ranking * Images from Rakuten VIKI
  • 62. 62 Job: • Make some data ready for consumption. Questions: • How does the data come? • What needs to be done for it to be ready? • How will the data be consumed? database Pre- processing function Trigger / monitor function Raw Data Data access function FTP API Data provider Data consumer
  • 63. 63 Job: • Let outsider use a function. Questions: • How frequently will the function be used? • What data does the function need? Application logic API Endpoint Web Application API Cache Request Queue Application Cache Internal/External Data
  • 64. 64 Rakuten TV video contents Other video contents Rakuten VIKI video contents Search function 3rd Party Platform Motion Dictionary * Images from Rakuten VIKI
  • 65. 65 Japanese Dictionary Data dictionary function voice function 3rd party solution Korean Dictionary Data Chinese Dictionary Data 3rd party solution open source framework Interactive Subtitles (version 2) Interactive Subtitles (version 3) * Images from Rakuten VIKI tokenization function Korean Tokenization Data Chinese Tokenization Data Japanese Tokenization Data open source framework open source framework open source framework Korean Tokenization Data Chinese Tokenization Data In-house solution In-house solution
  • 66. 66 Japanese Dictionary Data dictionary function voice function 3rd party solution Korean Dictionary Data Chinese Dictionary Data 3rd party solution open source framework Interactive Subtitles (version 2) Interactive Subtitles (version 3) * Images from Rakuten VIKI tokenization function Japanese Tokenization Data open source framework Global Tokenization Data In-house solution Global Dictionary Data In-house solution Korean Tokenization Data Chinese Tokenization Data In-house solution In-house solution
  • 67. 67 Take Quiz function Vocab Quiz (version 1) * Images from Rakuten VIKI Chinese Quiz Data Korean Quiz Data
  • 68. 68 Chinese Quiz Data Take Quiz function voice function Vocab Quiz (version 2) * Images from Rakuten VIKI Korean Quiz Data
  • 69. 69 Fast iteration in R&D won’t be possible if we had many things bundled or coupled. -- Pang Vocab Quiz • https://languagequiz.viki.com/ Learn Mode (PC/Mac only) • https://www.viki.com/collections/316981l-learn-the-basics-chinese • https://www.viki.com/collections/316939l-learn-the-basics-korean Motion Dictionary • TBD