This is the English version of my Machine Translation course program for the following course slides (in Russian):
http://www.slideshare.net/dmitrykan/introduction-to-machine-translation-2911038
and
http://www.slideshare.net/dmitrykan/introduction-to-machine-translation-1
1. Machine Translation course program
Brief description of the course:
There are two fundamental approaches to machine translation: rule-based approach (based on formal
models of natural languages, like e.g. dependency grammars) and statistical approaches (based on parallel
streams of data). Both these approaches have their advantages: rule-based one being formal and
structured, while statistic approach gives an opportunity to construct and scale the system without the
need to deeply study properties of a natural language. On the other hand both these approaches have their
problematic areas: rule-based approach is bound to a given language or a family of languages, while
statistic approach doesn’t allow controlling subtle structures and properties of a natural language, like for
example generating prepositions. Recently combining these two fundamental approaches have been of a
special interest of scientists. An entire pipeline of machine translation, starting from source language
formalization and finishing with word reordering on the target language side, can be considered as a
training area for combining rule based with statistics. This course will introduce students into all sub-tasks
of creating a machine translation system using both fundamental approaches: formalization of natural
language, translational dictionaries, phrase translation, machine translation models, decoding and word
reordering. The course will also present formal semantic models of natural languages and their place in the
topic. Along with that, machine learning methods (like structured prediction) will be in the focus of the
course. The course material assumes knowledge of general higher mathematics and knowledge or interest
in the natural language processing. We will have some hands-on and take-away knowledge sessions, which
assume familiarity with formats, NLP algorithms and libraries.
Course topics
1. Introduction to MT. Motivation of its existence
2. Short history of MT, mane phases. ALPAC report
3. MT systems triangle. Direct and indirect MT. Examples of MT systems
4. Current MT systems existing in the industry, main players
5. Existing software packages for natural language processing and building an MT system
6. Two fundamental approaches to MT: statistical and rule-based (classical)
7. Methods of MT
8. Direct MT system, its features, pros and cons.
9. Transfer MT system, types of transfer methods, features
10. Notion of interlingua. Features of MT based on interlingua, its comparison with transfer
11. Statistical MT and its components
12. Example based MT systems
13. Theory of statistical MT systems. Fundamental equation (Bayes theorem). Notion of statistical language
model. MT model
14. model of machine translation in statistical MT
15. Task of word alignment
16. Features of MT systems
17. Existing programming components of statistical MT systems
18. Evaluation of MT systems: human evaluation and automatic metrics
19. BLEU score
20. METEOR score
21. NIST score
22. Round-trip evaluation method
23. Hybrid MT systems
24. Task of word reordering in a sentence on the target side. Rule-based and statistical approaches
25. Computer semantics of a natural language. MT system based on it
26. Pragmatics and context analysis on cross-sentence level
27. Practical details of software packages: GIZA++, SRILM, Moses
2. 28. Method of structured prediction for learning machine translation models
Seminar topics
1. Mathematics of statistical MT, paper [1]
2. Hierarchical model of statistical MT, paper [2]
3. Phrase-based statistical MT, paper [3]
4. Rule-based MT systems, papers [4,5]
5. Hybrid MT systems, based on examples, paper [6]
6. BLEU score in details, paper [8]
7. Robust large-scale MT systems, based on examples, paper [9]
Bibliography
[1] Brown P., Della Petra S., Della Petra V., Mercer R.: The Mathematics of
Statistical Machine Translation: Parameter Estimation, 1993
[2] Chiang D.: A Hierarchical Phrase-Based Model for Statistical Machine
Translation, 2005
[3] Koehn P., Och F., Marcu D.: Statistical Phrase-Based Machine Translation, 2003
[4] Kaplan R., Netter K., Wedekind J., Zaenen A.: Translation By Structural
Correspondences, 1989
[5] Landsbergen J.: The Rosetta Project, 1989
[6] Groves D., Way A.: Hybrid Example-Based SMT: the Best of Both Worlds?
[7] Athanaselis T., Bakamidis S., Dologou I.: Words Reordering based on Statistical
Language Model, 2006
[8] Papineni K., Roukos S., Ward T., Zhu W.-J.: BLEU: a Method for Automatic
Evaluation of Machine Translation, 2002
[9] Gough N., Way A.: Robust Large-Scale EBMT with Marker-Based Segmentation,
2004