In this talk Chengqing presents some work on development of statistical machine translation (MT) system based on the open source toolkit Moses at CASIA. In recent years, CASIA have developed several MT systems, including Chinese-to-English and English-to-Chinese, Japanese-to-Chinese, Arabic-to-Chinese, Uigur-to-Chinese and Tibetan-to-Chinese MT systems etc. Moses is a basic translation engine in our systems. Chengqing shows audience how CASIA use and extend Moses to develop the multilingual MT systems.
This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit.
MosesCore is supporetd by the European Commission Grant Number 288487 under the 7th Framework Programme.
Latest news on Twitter - #MosesCore
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia, 23 April 2012
1. How We Use Moses to Develop
Our Multi-lingual Machine
Translation Systems?
Chengqing ZONG (宗成庆)
Institute of Automation, Chinese Academy of Sciences
中国科学院自动化研究所
cqzong@nlpr.ia.ac.cn
100190 北京市海澱區中關村東路95號 電郵:cqzong@nlpr.ia.ac.cn
http://www.nlpr.ia.ac.cn/cip/cqzong.htm 電話: +86-10-6255 4263
2. Outline
1. Brief Introduction to Our Work
2. Main Features of Moses
3. How We Use Moses?
4. Our Feeling
3. 1. Brief Introduction to Our Work
Our group is working with machine translation
(MT) research and system development in the
National Laboratory of Pattern Recognition
(NLPR), Institute of Automation, Chinese
Academy of Sciences (CASIA).
u 6 staffs
u 8 Ph.D candidates, 1 Master student
u 5 visiting scholars
5. 1. Brief Introduction to Our Work
Multilingual text-to-text translation system
Japanese
Chinese
6. 1. Brief Introduction to Our Work
n In
evaluation of spoken
language translation
(SLT) organized by
IWSLT’2007
The performance of CE
clean text translation of
our system was the best
one according to the
results of human
rankings.
11. 1. Brief Introduction to Our Work
² In MT evaluation organized by China Workshop on
Machine Translation (CWMT) 2011 (Sept. 23~ 24th), our
system participated in all tasks:
1. Chinese to English (News domain, progress)
2. English to Chinese (News domain, progress)
3. English to Chinese (News domain, current)
4. English to Chinese (Science domain)
5. Japanese to Chinese (News domain)
6. Tibetan to Chinese (Government documents)
7. Mongolian to Chinese (Daily)
8. Uigur to Chinese (News domain)
9. Kazakh to Chinese (News domain)
10. Kir Kyrgyz to Chinese (News domain)
19 Units and 165 Systems participated in this evaluation
12. 1. Brief Introduction to Our Work
According to BLEU scores, the performance of our
system was the top one in the following 5 tasks :
ü English to Chinese (News domain, progress)
ü Japanese-to-Chinese (News domain)
ü Tibetan to Chinese (Government documents)
ü Mongolian to Chinese (Daily)
ü Kir Kyrgyz to Chinese (News domain)
And it is ranked at the second position in the following 4
tasks: ü Chinese to English (News domain, progress)
ü English to Chinese (News domain, current)
ü Uigur to Chinese (News domain)
ü Kazakh to Chinese (News domain)
13. Outline
1. Brief Introduction to Our Work
2. Main Features of Moses
3. How We Use Moses?
4. Our Feeling
14. 2. Main Features of Moses
n The basic ideas of statistical machine translation
(SMT) can be formulated in principle as
ebest =argmaxe p(f | e)×pLM(e)×wlength(e)
Now it is usually implemented by a log-linear
model:
weight feature
15. 2. Main Features of Moses
Some useful features include:
ü Phrase translation probability ;
ü Lexical phrase translation probability ;
ü Inversed phrase translation probability ;
ü Inversed lexical phrase translation probability ;
ü English language model based on n-gram ;
ü English sentence length penalty ;
ü Chinese phrase count penalty.
16. 2. Main Features of Moses
A phrase-based example:
欧洲 部分 地区 遭受 洪水 袭击
(1)
欧洲 部分 地区 遭受 洪水 袭击
(2)
Europe parts of hit by floods
(3)
parts of Europe hit by floods
17. 2. Main Features of Moses
Development
data
Parallel Moses
data training Test data
Translation Moses
model decoder
Target Moses
The Framework: translation evaluation
Good or bad
18. 2. Main Features of Moses
n Offer two types of translation models:
phase-based and tree-based
n Support factored translation models
n Allow the decoding of different kinds of
inputs: sentences, confusion networks and
word lattices
19. 2. Main Features of Moses
n Support n-best translation output besides the
best one
l This is a good conference.
l This was a great conference.
l It is a good meeting.
l ……
n Provide an experimental management system
n Translate fast with a good translation quality
20. 2. Main Features of Moses
n Keep balance on Speed or Quality?
n If we want translation speed, Moses provides us
many options to accelerate the translation
process, such as beam size, the granularity of
translation rules.
n If we pursue translation quality, Moses also
allows us to enlarge the translation search space
in order to have a bigger change to obtain a
better translation.
21. 2. Main Features of Moses
n It now includes more and more even better
translation models
n Hierarchical Phrase-based Translation Model
(HPB)
n Tree-to-Tree/String-to-Tree Translation Models
n It provides more new features, such as
faster language modeling, multi-thread
decoding, client-server translation etc.
It keeps improving ……
22. 2. Main Features of Moses
n Moses provides good documentation and
friendly interface
n We can upgrade the components if we need
n We can develop hybrid translation methods
in the framework of Moses
It allows extension ……
23. Outline
1. Brief Introduction to Our Work
2. Main Features of Moses
3. How We Use Moses?
4. Our Feeling
24. 3. How We Use Moses?
n Moses facilitates our research work
l For the beginners of SMT
l For the researchers familiar with SMT
l For the engineers to build an SMT system
25. 3. How We Use Moses?
u For the beginners of SMT:
n For most beginners of SMT, Moses is the most fresh
and vivid tutorials to give the beginners an intuitive
feeling of SMT;
n Detailed guidance is very easy for beginners to use;
n It can provide a preliminary understanding of the
modules involved in the SMT system;
n It can guide beginners to locate their interested
research in SMT quickly.
26. 3. How We Use Moses?
We use Moses as a tutorial tool.
27. 3. How We Use Moses?
u For the researchers familiar with SMT
n Moses provides the whole toolkit for
building a translation system
n data preparation, word alignment, translation rule
extraction, parameter tuning, decoding, and
evaluation
n We just need to study the sub-models that
we are interested in and then propose new
algorithms, and finally verify the
effectiveness using Moses
28. 3. How We Use Moses?
n For example, we proposed a new algorithm of
word alignment and translation rule
extraction
n Moses can help us to verify the effectiveness
of the proposed methods in just few days. It
accelerates our research work a lot
n The most important for MT researchers,
Moses has become a de facto standard
baseline to test their own models
29. 3. How We Use Moses?
We develop new models to compare
with Moses and propose new algorithms
to implement on Moses platform.
30. 3. How We Use Moses?
Interlingua
Semantic Semantic
Tree-to-tree
Syntax Syntax
String-to-tree Tree-to-string
Formalism gram. Hierarchical Formalism gram.
phrase based
Phrase-based
Phrases Phrases
Word-based model
Source language Target language
31. 3. How We Use Moses?
u For the engineers to build an SMT system
n They do not need to care about the principle about
how Moses works
n just need to provide training data, development data,
and test data
n do some pre-processing work to make data clean
n do some post-processing work to convert the output
32. Source sentence
Pre-processing
MT engine 1 Moses
MT engine 2 … MT engine 6
n-best list n-best list … n-best list
Merged n-best list MBR decoder
Word aligning References for
alignment
Merging alignments
Decoder based on
Confusion network C.N Translation
NLPR, CAS-IA 4/23/12 32
33. 3. How We Use Moses?
We also use Moses as a tool to
evaluate the quality of some collected
parallel corpus because we can build an MT
system in two or three days based on the
corpus and evaluate the quality of translation.
We know how well the translation quality
reflect the quality of corpus.
34. 3. How We Use Moses?
For example,
1-1 merkezdiki dölet apparatliri bilen jaylardiki dölet
apparatlirining xizmet hoquqi merkezning bir tutash
rehberlikide jaylarning teshebbuskarliqi we aktipliqini toluq
jari qildurush prinsipi boyiche ayrilidu.
1-2 中央和地方的国家机构职权的划分,遵循在中央的统一
领导下,充分发挥地方的主动性、积极性的原则。
2-1 madda jungxua xelq jumhuriyitide hemme millet
bapbarawer.
2-2 中华人民共和国各民族一律平等。
……
35. 3. How We Use Moses?
Many participant systems in MT
evaluations in the world employ Moses,
such as in evaluations of NIST, WMT, IWSLT
and CWMT etc.
36. 3. How We Use Moses?
Systems Use Moses?
DCU √ 7 among 11
DFKI √ systems
FBK √ employed Moses
KIT √
in SLT evaluation
LIG √
LIMSI
of IWSLT’2011!
LIUM √
MIT
MSR
NICT √
RWTH
37. 3. How We Use Moses?
Systems Use Systems Use Moses ?
Moses ?
DCU √ HIT √
16 among 19
NTT √ IMNU √
systems
Systran √ FRDC √ employed
ICT-CAS √ BUAA √
Moses in MT
evaluation of
IA-CAS √ XMU √
CWMT’2011!
IS-CAS √ IIM √
NEU NJU
XAUT √ BJTU √
ISTIC XJU √
XJIPC √
38. Outline
1. Brief Introduction to Our Work
2. Main Features of Moses
3. How We Use Moses?
4. Our Feeling
39. 4. Our Feeling
n Moses is our friend
n It is a good helper and saves us a lot of labor
n It is a good mirror to reflect the quality of our
MT systems
n It is a roll booster of MT research
We love our friend!
40. 4. Our Feeling
n Moses is our competitor
n We hope to develop new translation models to
surpass Moses, as an MT researcher
n Competition makes us get progress
We love our competitor!
We love Moses!