Enable Fast Iteration in R&D- Use modular, loosely coupled architectures so changes don't have widespread impact- Automate testing and deployments to streamline the development cycle - Implement continuous integration/delivery to get feedback quickly- Empower cross-functional teams with autonomy over their work- Adopt agile methodologies like Scrum, Kanban to support experimentation- Colocate teams physically to facilitate collaboration and rapid problem-solving- Leverage cloud infrastructure for flexible, on-demand compute resources- Invest in tools that enhance developer productivity like IDEs, version control etc.- Foster a culture
This document provides an overview of Rakuten VIKI's language learning tools and technologies. It discusses interactive subtitles, vocabulary quizzes, and motion dictionaries. For vocabulary quizzes, it describes how quizzes are automatically generated using word embeddings and lists of drama vocabulary. Over 60,000 quizzes have been taken so far with very positive engagement on social media. Interactive subtitles allow learners to access definitions, translations and quizzes directly within video content. The document also presents Rakuten's approach to building cross-language dictionaries and tokenizers at scale through mining online sources and machine translation models. Overall the document outlines Rakuten VIKI's suite of products and technologies for entertaining and effective foreign
Similaire à Enable Fast Iteration in R&D- Use modular, loosely coupled architectures so changes don't have widespread impact- Automate testing and deployments to streamline the development cycle - Implement continuous integration/delivery to get feedback quickly- Empower cross-functional teams with autonomy over their work- Adopt agile methodologies like Scrum, Kanban to support experimentation- Colocate teams physically to facilitate collaboration and rapid problem-solving- Leverage cloud infrastructure for flexible, on-demand compute resources- Invest in tools that enhance developer productivity like IDEs, version control etc.- Foster a culture
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...Preetha Chatterjee
Similaire à Enable Fast Iteration in R&D- Use modular, loosely coupled architectures so changes don't have widespread impact- Automate testing and deployments to streamline the development cycle - Implement continuous integration/delivery to get feedback quickly- Empower cross-functional teams with autonomy over their work- Adopt agile methodologies like Scrum, Kanban to support experimentation- Colocate teams physically to facilitate collaboration and rapid problem-solving- Leverage cloud infrastructure for flexible, on-demand compute resources- Invest in tools that enhance developer productivity like IDEs, version control etc.- Foster a culture (20)
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Enable Fast Iteration in R&D- Use modular, loosely coupled architectures so changes don't have widespread impact- Automate testing and deployments to streamline the development cycle - Implement continuous integration/delivery to get feedback quickly- Empower cross-functional teams with autonomy over their work- Adopt agile methodologies like Scrum, Kanban to support experimentation- Colocate teams physically to facilitate collaboration and rapid problem-solving- Leverage cloud infrastructure for flexible, on-demand compute resources- Invest in tools that enhance developer productivity like IDEs, version control etc.- Foster a culture
5. 5
1.8 billion people are learning foreign languages
Source: The Washington Post: https://www.washingtonpost.com/news/worldviews/wp/2015/04/23/the-worlds-languages-in-7-maps-and-charts
Languages with most
native speakers
Most commonly studied
foreign languages
9. 9
Interactive subtitles1
Fast adoption
30,000 DAU
– daily active users
High engagement
Korean Learn Mode
users view 10% more
than Viki average
High satisfaction
83 NPS
– net promoter score
*cnet.com @ CBS Interactive Inc. Apr 13, 2017; Keia.org, Korean Economic Institute, Apr 2017; Forbes Oct 24, 2017; The Verge, Sep 28, 2017
10. Shows availability
“Daughter
Back”
“Return of
Happiness”
“Ice and Fire
of Youth”
“My Love
from the Star”
“Boys Over
Flowers”
“Descendants
of the Sun”
Learn Chinese (Japan) Learn Korean (USA)
* Images from Rakuten VIKI
[ Learn Mode collection on viki.com ]
11. 11
• 60,000+ quizzes taken
• 35,000+ users completed the quiz
• Very positive social media engagement:
2 Drama Vocab Quiz [ languagequiz.viki.com ]
13. 13
“ If you talk to a man
in a language he understands,
that goes to his head.
If you talk to him in his language,
that goes to his heart. ”
- Nelson Mandela
15. Oct 28, 2017
Stanley Kok
Principal Research Scientist
Rakuten Institute of Technology (Singapore)
16. you
16
你 是 辣妹 , 也是 名门贵 族
你是辣妹,也是名门贵族
你 是 辣妹 , 也是 名门贵族
are (a) hot chick and also (of) the gentry
Splitting a sentence into pieces, each preserving
its original semantics
you are (a) hot chick and also tribe
17. 17
努力的人才会成功
努力 的 人 才 会 成功
only hardworking people will succeed
努力 的 人才 会 成功
hardworking talent will succeed
20. 20
Many open-source tokenizers available
Good, but not perfect
Different mistakes
Why not use more (or all) of them to improve
tokenization?
Strengths of one tokenizer overcomes
shortcomings of another
21. 21
How to quantify “goodness” of tokenization?
Take human learner’s perspective
#Dictionary look-ups needed to understand all tokens
Non-existent tokens assumed to need large #lookups (10)
你 是 辣妹 你 是 辣 妹 你 是辣 妹
hot
chick
areyou
younger
sister
spicy
areyou younger
sister
?you
1 + 1 + 1 = 3
1 + 1 + 1 + 1 = 4
1 + 10 + 1 = 12
22. 22
Can do better than picking lowest cost
tokenization from tokenizers
Treat common tokens as “anchor points”
Pick best tokens from remaining ones
23. 23
你 是 辣妹 也是 名门贵 族
你 是辣 妹 也是 名门贵族
你 是 辣妹 也是 名门贵族
you are hot chick
and also tribe
you
younger
sister
and also (of) the gentry
(15)
(14)
(5)
24. 24
Dictionaries are important for language learning
Manual approach provides high-quality dictionary,
but not scalable
About 7000 languages in the world
About 49 million bilingual dictionaries
Thus need automatic approach
25. 25
Lots of online dictionaries available
Could we automatically learn new dictionaries
from them?
Focus on Chinese-English (C-E) & Korean-
English (K-E) bilingual dictionaries
26. 26
Lots of dictionaries online
Some are C-E and K-E, but many are not
Many dictionaries are C-X and X-E
Use language X as bridge/pivot
C-X + X-E => C-E, e.g.,
辣妹->fille sexy + fille sexy ->hot chick
=> 辣妹-> hot chick
27. 27
Take 2 hops for now
Chinese-English dictionary has 750K entries
90% correct
Korean-English dictionary has 100K entries
99% correct
28. 28
Learn bilingual dictionary using
Using seed lexicon
Monolingual data (plentiful)
Maps bi-lingual phrases to vector space
dolphin
海豚
东京Tokyo
Sushi
寿司
31. 31
Artifact of standard machine translation pipeline
Parallel sentences aligned word for word
Compute probability of mapping tokens of a
source language to those of a target language
A correct source token will be more
consistently aligned to its corresponding
target token(s)
Add high-probability mappings to dictionary
33. 33
Chinese-English Dictionary
3 million Chinese tokens (Jan’17)
89% in dictionary
Korean-English Dictionary
4 million Korean tokens (Jan’17)
86% in dictionary
36. 36
他 放弃 梦想
He gave up his dreams
Chinese English AveProb
放弃 gave up his 0.74
放弃 quit, 0.83
放弃 abdicate 0.68
Phrase Table
37. 37
他 放弃 梦想
He gave up his dreams
Chinese English AveProb
放弃 gave up his 0.74
放弃 quit 0.83
放弃 abdicate 0.68
Phrase Table
Best Match
38. 他 放弃 梦想
He gave up his dreams
best match
38
Chinese English AveProb
放弃 gave up his 0.74
放弃 quit 0.83
放弃 abdicate 0.68
Phrase Table
best match
Chinese English
放弃 abandon
放弃 give up
放弃 abdicate
Dictionary
39. Drama Vocabulary Quiz
Liling Tan
Rakuten Institute of Technology (Singapore)
28 Oct 2017 @ Rakuten Tech. Conference
41. 41
Introduction
•Quizzes are fun and could be viral
•But manually creating quizzes is tedious
•We created #DramaVocabQuiz that generates new
vocabulary quizzes automatically
49. 49
Korean Drama Word List
• The word 미남 [minam] “handsome guy” can be followed by multiple suffixes at once -이시라
구요 [-issilaguyo] to form a single word meaning “someone said that he is handsome”.
• We only extract the root word 미남 [minam], and count it as a unique word type
54. 54
Generate the Distractors
• Distractor 1: Select the top 5th to 20th closest words (cosine)
• Distractor 2: Use Distractor 1 as negative and question word as
positive, select 1st to 20th closest word (cosmul)
References:
• Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In ICLR.
• Omer Levy and Yoav Goldberg. 2014. Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL.
55. 55
Language Leaners Like Quizzes!!
• 60,000+ quizzes taken
• 35,000+ unique users completed quiz
• 16% of the users repeated quiz
56. 56
Word Frequency is a Good Indicator of Difficulty
10
8
6
4
2
0
Easy Medium Hard
Easy = Frequent words
Medium = Less Frequent
words
Hard = Least Frequent
words
61. 61
Web Search In-Video Search
•The meta data of the site
•The meta data of the page
•The word tokens in the page
•The topic of the page
•The originality of the page
•Hyperlinks (page rank)
• The meta data of the video
•The meta data of this clip
(timestamp, length, URI, etc.)
• The caption text of the clip
• The frames & audio signal
•Complexity of the sentence
•Diversity of the clips
site
identifier
page
identifier
content
ranking
search
relevancy
video
identifier
clip
identifier
search
relevancy
content
ranking
* Images from Rakuten VIKI
62. 62
Job:
• Make some data ready for consumption.
Questions:
• How does the data come?
• What needs to be done for it to be ready?
• How will the data be consumed?
database
Pre-
processing
function
Trigger /
monitor
function
Raw
Data
Data access
function
FTP API
Data provider
Data consumer
63. 63
Job:
• Let outsider use a function.
Questions:
• How frequently will the function be used?
• What data does the function need?
Application
logic
API
Endpoint
Web Application
API Cache
Request
Queue
Application
Cache
Internal/External Data
69. 69
Fast iteration in R&D won’t be possible
if we had many things bundled or coupled.
-- Pang
Vocab Quiz
• https://languagequiz.viki.com/
Learn Mode (PC/Mac only)
• https://www.viki.com/collections/316981l-learn-the-basics-chinese
• https://www.viki.com/collections/316939l-learn-the-basics-korean
Motion Dictionary
• TBD