MTM 2015

Tenth MT Marathon 2015
Prague, Czech Republic
September 7-12, 2015

MT Evaluation
Yvette Graham (DCU)
http://ufal.mff.cuni.cz/mtm15/files/01-mt-evaluation-yvette-graham.pdf

MT Evaluation
Yvette Graham (DCU)

Introduction to Machine Translation and
Phrase-Based Machine Translation
http://ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf
Aleš Tamchyna (UFAL)

MT Talks
Ondřej Bojar
• http://mttalks.ufal.cz/
• Mini-lectures on MT
• Coding Exercises that complement the lectures

MT Talks
Ondřej Bojar
• Intro: Why is MT difficult, approaches to MT.
• MT that Deceives: Serious translation errors even for short and simple inputs.
• Pre-processing: Normalization and other technical tricks bound to help your MT system.
• MT Evaluation in General: Techniques of judging MT quality, dimensions of translation quality, number of possible translations.
• Automatic MT Evaluation: Two common automatic MT evaluation methods: PER and BLEU
• Data Acquisition: The need and possible sources of training data for MT, the diminishing utility of the new data additions due to Zipf's
law.
• Sentence Alignment: An introduction to the Gale & Church sentence alignment algorithm.
• Word Alignment: Cutting the chicken-egg problem.
• Phrase-based Model: Copy if you can.
• Constituency Trees: Divide and conquer.
• Dependency Trees: Trees with gaps.
• Rich Vocabulary: Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz.
• Scoring and Optimization: Features your model features.

Language Modelling
Kenneth Heafield (University of Edinburgh)
http://ufal.mff.cuni.cz/mtm15/files/09-language-modelling-kenneth-heafield.pdf
LM = 50% of CPU

Discriminative Training
Miloš Stanojević (ILLC, University of Amsterdam)
http://ufal.mff.cuni.cz/mtm15/files/10-discriminative-training-milos-stanojevic.pdf

Deep Syntactic MT and TectoMT
Martin Popel (UFAL)
http://ufal.mff.cuni.cz/mtm15/files/12-deep-syntactic-mt-and-tectomt-martin-popel.pdf

Deep Syntactic MT and TectoMT
• 1.2s per sentence
• Worst in WMT 2015
• Depfix can detect & fix negation,
mostly tries to fix morphological agreement
• Originally CS-EN but within QTLeap adapted to
CS-EN, EN-ES, EN-NL, EN-PT, EN-EU
• 67% errors from transfer, 30% from analysis
Martin Popel (UFAL)
http://ufal.mff.cuni.cz/mtm15/files/12-deep-syntactic-mt-and-tectomt-martin-popel.pdf

Syntax-Based Models and Decoding
http://ufal.mff.cuni.cz/mtm15/files/19a-syntax-based-models-hieu-hoang.pdf
http://ufal.mff.cuni.cz/mtm15/files/19b-cyk-hieu-hoang.pdf
Hieu Hoang (New York University, Abu Dhabi)

Real-World Application of an Machine
Translation Workflow
• Cost is not the most important
driver - it is speed / shorter
turnaround time
• Pricing is part of the business
relationship. MT usage is only one
of many driving factors.

Neural Network Models and Google
TranslateKeith Stevens (Google)
http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf

Text Representations for NLP and MT
Hinrich Schütze
• Reduce sparseness with morphological analysis for better machine
translation
• MarMoT - A fast and accurate morphological tagger -
http://cistern.cis.lmu.de/marmot/

Hinrich Schütze
• Use embeddings for lemmata, not for word forms
• Embeddings and morphological resources provide complementary
information - use both!

Hinrich Schütze

•Use lemmata for MT
•Use embeddings for MT
•Use linguistic morphological resources for MT
•Don’t represent sentences as vectors for MT
•Deep learning will not replace other MT work . . .
•. . . but will be a powerful component of MT systems.
Hinrich Schütze

translate5
http://www.translate5.net/login
http://ufal.mff.cuni.cz/mtm15/files/03-translate5-lab-marc-mittag.pdf
Column-based approach on data

eman
What is eman?
• A tool for managing pipelines of steps.
• Purpose independent, but bundled with an ecosystem of tools for machine translation.
• Written in Perl 5, runs on Linux (and probably other Unices).
• Tasks submitted locally or using SGE cluster.
Key Features
• Create complex experiment pipelines.
• Clone whole experiments or individual steps.
• Re-use existing steps when possible.
• Automaticaly resolve complex step dependencies.
• Seamlessly share steps with others.
• Generate tables of results based on customizable rules.
• Easily scriptable and hacking friendly.
https://ufal.mff.cuni.cz/eman/

Box — Moses Suite on Amazon EC2
Here's what Box v2015-05-25 beta (current release) includes:
cdec/ Popular SMT framework: http://www.cdec-decoder.org
cmph/ Hashing library (for compact phrase tables in Moses): http://cmph.sourceforge.net
ducttape/ Experiment management system (for cdec): https://github.com/jhclark/ducttape
eigen3/ Linear algebra library (for cdec): http://eigen.tuxfamily.org/index.php?title=Main_Page
fast_align/ Word alignment tool : https://github.com/clab/fast_align
giza-pp/ Word alignment package (for Moses): http://www.statmt.org/moses/giza/GIZA++.html
kenlm/ Language modeling toolkit: http://kheafield.com/code/kenlm/
mgiza/ Multi-threaded Giza++ : http://www.kyloo.net/software/doku.php/mgiza:overview
mosesdecoder/ Popular SMT framework: http://www.statmt.org/moses/
multeval/ MT evaluation tool: https://github.com/jhclark/multeval
rnnlm/ Neural network language modeling toolkit: http://rnnlm.org
salm/ Suffix-array toolkit for NLP (for Moses): https://github.com/moses-smt/salm
scala/ Programming language (for cdec): http://www.scala-lang.org
vowpal_wabbit/ Machine learning toolkit compatible with Moses: http://hunch.net/~vw/
word2vec/ Continuous word representations: https://code.google.com/p/word2vec/
http://www.boxresear.ch/

Treex
http://ufal.mff.cuni.cz/treex

MT-ComparEval
Martin Popel
• Graphical evaluation interface for Machine Translation
development
• web-based tool for MT developers
• check progress of a system over time or compare several MT systems
• focus on analyzing system differences
• API for uploading translations
• Try it now - http://wmt.ufal.cz
• Install it - https://github.com/choko/MT-ComparEval/

MT-ComparEval
Martin Popel
• Online A = Bing?
• Online B = Google?
• systems = tasks
• Newest version of BLEU
• Some sentence level smoothing

Joshua 6
• (New!) Phrased-based decoder (no OSM or lexical distortion)
• (New!) Language packs

CloudLM: a Cloud-based Language Model for
Machine Translation

Evaluating MT systems with BEER

Sampling Phrase Tables for the Moses
Statistical Machine Translation System

Appraise++
An open-source system for manual evaluation of MT output
It supports collaborative collection of human feedback for MT evaluation.
It implements tasks such as Translation Quality Checking, Ranking and
Error Classification, and Manual Post-Editing.
http://appraise.cf/

Segmentation-Aware Language Model

Deep Machine Translation
Workshop 2015
September 3-4, 2015

TectoMT Seminar 2015
September 3-4, 2015
100% Acceptance Rate!

Charles University in Prague
Faculty of Mathematics and Physics
Institute of Formal and Applied Linguistics

MTM 2015

Recommandé

Recommandé

Contenu connexe

Similaire à MTM 2015

Similaire à MTM 2015 (20)

Plus de Matīss ‎‎‎‎‎‎‎

Plus de Matīss ‎‎‎‎‎‎‎ (20)

Dernier

Dernier (20)

MTM 2015