This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit.
MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme.
For the latest updates, follow us on Twitter - #MosesCore
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
TAUS MT SHOWCASE, Google Translator Toolkit, Patcharin Areewong, Google, 10 April 2013
1. TAUS
MACHINE
TRANSLATION
SHOWCASE
Google Translator Toolkit
14:20 – 14:40
Wednesday, 10 April 2013
Patcharin Areewong
Google
2. Google Confidential and Proprietary 2
Machine Translation
Patcharin Areewong
Language Specialist • Singapore
3. 3
Agenda
Google Confidential and Proprietary
• Introduction
• Google translate
• Google translator toolkit
• Demo
4. 4
Translation is the key to universal access
Google Confidential and Proprietary
“Organize the world's information and make it
universally accessible and useful.”
kiana
привет
hallo
hello こんにちは
你好
hola ﻡمﺭرﺡحﺏبﺍا
!लो
bonjour
jambo
napaykullayki
olá mhoroi
5. 5
A translator’s story…
Google Confidential and Proprietary
Translator
2,000 words per day
250 days per year
500,000 words per year
6. 6
The Google story…
Google Confidential and Proprietary
Google
1,000,000,000 new pages indexed
per year*
400 words per page
400,000,000,000 words per year
400B/0.5M=800,000
translators
and that’s for only
1 additional language!
* actually, it’s several billion pages but let’s round it down to 1 billion
7. 7
So much content! So little resource!
Google Confidential and Proprietary
Organize the world’s information and make it universally accessible and
useful.
8% of world population speaks Arabic. Only 1% of the web is in Arabic.
>50% of Google searches come from outside the United States.
Localized search in more than 55 countries, 35 languages.
Google UI available in 117 languages across 157 international domains.
>99% of what people write, say, or generate never leaves the language in
which it was created.
http://www.commonsenseadvisory.com/Resources/FactsandFigures/tabid/1213/Default.aspx
Madar research, 2007
9. 9
Machine translation is better than nothing
Google Confidential and Proprietary
• Handle overwhelming volume
• Provide Immediate turn-around
• No access to human translator
• Users are one click away from inaccessible
content
9
Google Confidential
10. 10
Drive demand for human translation
Google Confidential and Proprietary
Variable Value
Customer demand 1 vs. 10
Revenue per customer €10 vs. €100
MT experiment can measure customer demand, profitability.
If the content is proved valuable, it’s worth the human
translation price
If you talk to a man in a language he understands, that goes to his head.
If you talk to him in his language, that goes to his heart.
– Nelson Mandela
11. 11
Expand translation industry
Google Confidential and Proprietary
Human translation expensive for content with uncertain value
• If 10 readers, expensive to pay $10,000 for translation
• If 100,000 readers, $10,000 translation worth the price
Monitor adoption, feedback
• Human-translate inscrutable content: offensive, misleading
• Human-translate high-traffic content: worth the price
Bottom line:
- Quick experiments
- Low cost
12. 12
Google Confidential and Proprietary
Target users: amateurs and advanced translators
Languages: 100,000+ pairs
MT languages: 88 (includes variants)
UI languages: 36
File types: AdWords, HTML, TXT, PO, Android
App, Chrome Extension, RTF,
subtitles, Word, ODT
General features: WYSIWYG editor, sharing, chat, spell
check, publish to Google Docs
Advanced features: translation memory, glossary, Google
dictionary, label, concordance,
scoping, split/merge, repetition
13. 13
Google: open translation ecosystem
Google Confidential and Proprietary
How can you be big without being evil? We don’t trap
end users. So if you don’t like Google, if for whatever
reason we do a bad job for you, we make it easy for you
data to move to our competitor.
liberation! – Eric Schmidt
Google translation follows “data liberation” principle
• Support open standards, APIs
Data stored in Translator Toolkit is your data – not Google’s
• Cloud storage for access, scale
• Documents, glossaries, and TMs for your eyes only unless you share
• You control addition, deletion of your translations