7. Introduction to Machine Translation and
Phrase-Based Machine Translation
http://ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf
Aleš Tamchyna (UFAL)
8. MT Talks
Ondřej Bojar
• http://mttalks.ufal.cz/
• Mini-lectures on MT
• Coding Exercises that complement the lectures
9. MT Talks
Ondřej Bojar
• Intro: Why is MT difficult, approaches to MT.
• MT that Deceives: Serious translation errors even for short and simple inputs.
• Pre-processing: Normalization and other technical tricks bound to help your MT system.
• MT Evaluation in General: Techniques of judging MT quality, dimensions of translation quality, number of possible translations.
• Automatic MT Evaluation: Two common automatic MT evaluation methods: PER and BLEU
• Data Acquisition: The need and possible sources of training data for MT, the diminishing utility of the new data additions due to Zipf's
law.
• Sentence Alignment: An introduction to the Gale & Church sentence alignment algorithm.
• Word Alignment: Cutting the chicken-egg problem.
• Phrase-based Model: Copy if you can.
• Constituency Trees: Divide and conquer.
• Dependency Trees: Trees with gaps.
• Rich Vocabulary: Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz.
• Scoring and Optimization: Features your model features.
10. Language Modelling
Kenneth Heafield (University of Edinburgh)
http://ufal.mff.cuni.cz/mtm15/files/09-language-modelling-kenneth-heafield.pdf
LM = 50% of CPU
12. Deep Syntactic MT and TectoMT
Martin Popel (UFAL)
http://ufal.mff.cuni.cz/mtm15/files/12-deep-syntactic-mt-and-tectomt-martin-popel.pdf
13. Deep Syntactic MT and TectoMT
• 1.2s per sentence
• Worst in WMT 2015
• Depfix can detect & fix negation,
mostly tries to fix morphological agreement
• Originally CS-EN but within QTLeap adapted to
CS-EN, EN-ES, EN-NL, EN-PT, EN-EU
• 67% errors from transfer, 30% from analysis
Martin Popel (UFAL)
http://ufal.mff.cuni.cz/mtm15/files/12-deep-syntactic-mt-and-tectomt-martin-popel.pdf
14. Syntax-Based Models and Decoding
http://ufal.mff.cuni.cz/mtm15/files/19a-syntax-based-models-hieu-hoang.pdf
http://ufal.mff.cuni.cz/mtm15/files/19b-cyk-hieu-hoang.pdf
Hieu Hoang (New York University, Abu Dhabi)
16. Real-World Application of an Machine
Translation Workflow
• Cost is not the most important
driver - it is speed / shorter
turnaround time
• Pricing is part of the business
relationship. MT usage is only one
of many driving factors.
17. Real-World Application of an Machine
Translation Workflow
• Cost is not the most important
driver - it is speed / shorter
turnaround time
• Pricing is part of the business
relationship. MT usage is only one
of many driving factors.
18. Neural Network Models and Google
TranslateKeith Stevens (Google)
http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
19. Neural Network Models and Google
TranslateKeith Stevens (Google)
http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
20. Neural Network Models and Google
TranslateKeith Stevens (Google)
http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
21. Neural Network Models and Google
TranslateKeith Stevens (Google)
http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
22. Neural Network Models and Google
TranslateKeith Stevens (Google)
http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
23. Neural Network Models and Google
TranslateKeith Stevens (Google)
http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
24. Neural Network Models and Google
TranslateKeith Stevens (Google)
http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
25. Neural Network Models and Google
TranslateKeith Stevens (Google)
http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
26. Text Representations for NLP and MT
Hinrich Schütze
• Reduce sparseness with morphological analysis for better machine
translation
• MarMoT - A fast and accurate morphological tagger -
http://cistern.cis.lmu.de/marmot/
27. Text Representations for NLP and MT
Hinrich Schütze
• Use embeddings for lemmata, not for word forms
• Embeddings and morphological resources provide complementary
information - use both!
30. •Use lemmata for MT
•Use embeddings for MT
•Use linguistic morphological resources for MT
•Don’t represent sentences as vectors for MT
•Deep learning will not replace other MT work . . .
•. . . but will be a powerful component of MT systems.
Text Representations for NLP and MT
Hinrich Schütze
33. eman
What is eman?
• A tool for managing pipelines of steps.
• Purpose independent, but bundled with an ecosystem of tools for machine translation.
• Written in Perl 5, runs on Linux (and probably other Unices).
• Tasks submitted locally or using SGE cluster.
Key Features
• Create complex experiment pipelines.
• Clone whole experiments or individual steps.
• Re-use existing steps when possible.
• Automaticaly resolve complex step dependencies.
• Seamlessly share steps with others.
• Generate tables of results based on customizable rules.
• Easily scriptable and hacking friendly.
https://ufal.mff.cuni.cz/eman/
34. Box — Moses Suite on Amazon EC2
Here's what Box v2015-05-25 beta (current release) includes:
cdec/ Popular SMT framework: http://www.cdec-decoder.org
cmph/ Hashing library (for compact phrase tables in Moses): http://cmph.sourceforge.net
ducttape/ Experiment management system (for cdec): https://github.com/jhclark/ducttape
eigen3/ Linear algebra library (for cdec): http://eigen.tuxfamily.org/index.php?title=Main_Page
fast_align/ Word alignment tool : https://github.com/clab/fast_align
giza-pp/ Word alignment package (for Moses): http://www.statmt.org/moses/giza/GIZA++.html
kenlm/ Language modeling toolkit: http://kheafield.com/code/kenlm/
mgiza/ Multi-threaded Giza++ : http://www.kyloo.net/software/doku.php/mgiza:overview
mosesdecoder/ Popular SMT framework: http://www.statmt.org/moses/
multeval/ MT evaluation tool: https://github.com/jhclark/multeval
rnnlm/ Neural network language modeling toolkit: http://rnnlm.org
salm/ Suffix-array toolkit for NLP (for Moses): https://github.com/moses-smt/salm
scala/ Programming language (for cdec): http://www.scala-lang.org
vowpal_wabbit/ Machine learning toolkit compatible with Moses: http://hunch.net/~vw/
word2vec/ Continuous word representations: https://code.google.com/p/word2vec/
http://www.boxresear.ch/
37. MT-ComparEval
Martin Popel
• Graphical evaluation interface for Machine Translation
development
• web-based tool for MT developers
• check progress of a system over time or compare several MT systems
• focus on analyzing system differences
• API for uploading translations
• Try it now - http://wmt.ufal.cz
• Install it - https://github.com/choko/MT-ComparEval/
38. MT-ComparEval
Martin Popel
• Online A = Bing?
• Online B = Google?
• systems = tasks
• Newest version of BLEU
• Some sentence level smoothing
39.
40.
41.
42. Joshua 6
• (New!) Phrased-based decoder (no OSM or lexical distortion)
• (New!) Language packs
50. Appraise++
An open-source system for manual evaluation of MT output
It supports collaborative collection of human feedback for MT evaluation.
It implements tasks such as Translation Quality Checking, Ranking and
Error Classification, and Manual Post-Editing.
http://appraise.cf/