SlideShare une entreprise Scribd logo
1  sur  79
Télécharger pour lire hors ligne
Quality of Machine Translation Quality Estimation Open issues Conclusions
Estimativa da qualidade da tradu¸c˜ao
autom´atica
Lucia Specia
University of Sheffield
l.specia@sheffield.ac.uk
Faculdade de Letras da Universidade do Porto
13 May 2013
Estimativa da qualidade da tradu¸c˜ao autom´atica 1 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Outline
1 Quality of Machine Translation
2 Quality Estimation
3 Open issues
4 Conclusions
Estimativa da qualidade da tradu¸c˜ao autom´atica 2 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Outline
1 Quality of Machine Translation
2 Quality Estimation
3 Open issues
4 Conclusions
Estimativa da qualidade da tradu¸c˜ao autom´atica 3 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Introduction
Machine Translation:
Around since the early 1950s
Estimativa da qualidade da tradu¸c˜ao autom´atica 4 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Introduction
Machine Translation:
Around since the early 1950s
Increasingly more popular since 1990: statistical
approaches
Estimativa da qualidade da tradu¸c˜ao autom´atica 4 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Introduction
Machine Translation:
Around since the early 1950s
Increasingly more popular since 1990: statistical
approaches
Software tools and data available to build translation
systems - Moses and others
Estimativa da qualidade da tradu¸c˜ao autom´atica 4 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Introduction
Machine Translation:
Around since the early 1950s
Increasingly more popular since 1990: statistical
approaches
Software tools and data available to build translation
systems - Moses and others
Increasing demand for cheaper and fast translations
Estimativa da qualidade da tradu¸c˜ao autom´atica 4 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Introduction
Machine Translation:
Around since the early 1950s
Increasingly more popular since 1990: statistical
approaches
Software tools and data available to build translation
systems - Moses and others
Increasing demand for cheaper and fast translations
How do we measure quality and progress over time?
So far... mostly automatic evaluation metrics
Estimativa da qualidade da tradu¸c˜ao autom´atica 4 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
MT evaluation metrics
N-gram matching between system output and one or
more reference translations: BLEU and many others
Estimativa da qualidade da tradu¸c˜ao autom´atica 5 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
MT evaluation metrics
N-gram matching between system output and one or
more reference translations: BLEU and many others
Issue 1: Too many possible good quality translations,
need thousands of references to capture valid variations
Estimativa da qualidade da tradu¸c˜ao autom´atica 5 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
MT evaluation metrics
N-gram matching between system output and one or
more reference translations: BLEU and many others
Issue 1: Too many possible good quality translations,
need thousands of references to capture valid variations
Solution: HyTER (Language Weaver) annotation tool to
generate all possible correct translations! [DM12]
Translations built bottom-up from word/phrase
translation equivalents using FSA
2-2.5 hours worth of expert annotation per sentence
One annotator: 5.2 × 106 paths
A bunch of annotators: 8.5 × 1011 paths
Estimativa da qualidade da tradu¸c˜ao autom´atica 5 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
MT evaluation metrics
Issue 2: Difficult to quantify severity of mismatching
n-grams
Estimativa da qualidade da tradu¸c˜ao autom´atica 6 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
MT evaluation metrics
Issue 2: Difficult to quantify severity of mismatching
n-grams
ref Do not buy this product, it’s their craziest invention!
sys Do buy this product, it’s their craziest invention!
Estimativa da qualidade da tradu¸c˜ao autom´atica 6 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
MT evaluation metrics
Issue 2: Difficult to quantify severity of mismatching
n-grams
ref Do not buy this product, it’s their craziest invention!
sys Do buy this product, it’s their craziest invention!
Some attempts to weight mismatches differently -
sparse, lexicalised approach
Estimativa da qualidade da tradu¸c˜ao autom´atica 6 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
MT evaluation metrics
Issue 2: Difficult to quantify severity of mismatching
n-grams
ref Do not buy this product, it’s their craziest invention!
sys Do buy this product, it’s their craziest invention!
Some attempts to weight mismatches differently -
sparse, lexicalised approach
However, same error is more or less important depending
on the user or purpose:
Severe if end-user does not speak source language
Trivial to post-edit by translators
Estimativa da qualidade da tradu¸c˜ao autom´atica 6 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
MT evaluation metrics
Conversely:
ref The battery lasts 6 hours and it can be fully recharged
in 30 minutes.
sys Six-hours battery, 30 minutes to full charge last.
Estimativa da qualidade da tradu¸c˜ao autom´atica 7 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
MT evaluation metrics
Conversely:
ref The battery lasts 6 hours and it can be fully recharged
in 30 minutes.
sys Six-hours battery, 30 minutes to full charge last.
Ok for gisting - meaning preserved
Very costly for post-editing if style is to be preserved
Estimativa da qualidade da tradu¸c˜ao autom´atica 7 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Task-based evaluation
Measure translation quality within task. E.g. Autodesk -
Productivity test through post-editing [Aut11]
2-day translation and post-editing , 37 participants
In-house Moses (Autodesk data: software)
Time spent on each segment
Estimativa da qualidade da tradu¸c˜ao autom´atica 8 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Task-based evaluation
E.g.: Intel - User satisfaction with un-edited MT
Translation is good if customer can solve problem
Estimativa da qualidade da tradu¸c˜ao autom´atica 9 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Task-based evaluation
E.g.: Intel - User satisfaction with un-edited MT
Translation is good if customer can solve problem
MT for Customer Support websites [Int10]
Overall customer satisfaction: 75% for English→Chinese
Estimativa da qualidade da tradu¸c˜ao autom´atica 9 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Task-based evaluation
E.g.: Intel - User satisfaction with un-edited MT
Translation is good if customer can solve problem
MT for Customer Support websites [Int10]
Overall customer satisfaction: 75% for English→Chinese
95% reduction in cost
Project cycle from 10 days to 1 day
From 300 to 60,000 words translated/hour
Estimativa da qualidade da tradu¸c˜ao autom´atica 9 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Task-based evaluation
E.g.: Intel - User satisfaction with un-edited MT
Translation is good if customer can solve problem
MT for Customer Support websites [Int10]
Overall customer satisfaction: 75% for English→Chinese
95% reduction in cost
Project cycle from 10 days to 1 day
From 300 to 60,000 words translated/hour
Customers in China using MT texts were more satisfied
with support than natives using original texts (68%)!
Estimativa da qualidade da tradu¸c˜ao autom´atica 9 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Task-based evaluation
E.g.: Intel - User satisfaction with un-edited MT
Translation is good if customer can solve problem
MT for Customer Support websites [Int10]
Overall customer satisfaction: 75% for English→Chinese
95% reduction in cost
Project cycle from 10 days to 1 day
From 300 to 60,000 words translated/hour
Customers in China using MT texts were more satisfied
with support than natives using original texts (68%)!
MT for chat and community forums [Int12]
∼60% “understandable and actionable”
(→English/Spanish)
Max ∼10% “not understandable”
(→Chinese)
Estimativa da qualidade da tradu¸c˜ao autom´atica 9 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Outline
1 Quality of Machine Translation
2 Quality Estimation
3 Open issues
4 Conclusions
Estimativa da qualidade da tradu¸c˜ao autom´atica 10 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Overview
Metrics either depend on references or post-editing/use of
translations (task-based)
Estimativa da qualidade da tradu¸c˜ao autom´atica 11 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Overview
Metrics either depend on references or post-editing/use of
translations (task-based)
Our proposal
Quality assessment without reference, prior to
post-editing/use of translations
Estimativa da qualidade da tradu¸c˜ao autom´atica 11 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Overview
Why don’t translators use (more) MT?
Estimativa da qualidade da tradu¸c˜ao autom´atica 12 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Overview
Why don’t translators use (more) MT?
Translations are not good enough!
Estimativa da qualidade da tradu¸c˜ao autom´atica 12 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Overview
Why don’t translators use (more) MT?
Translations are not good enough!
What about TMs? Aren’t fuzzy matches useful?
Estimativa da qualidade da tradu¸c˜ao autom´atica 12 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Overview
Why don’t translators use (more) MT?
Translations are not good enough!
What about TMs? Aren’t fuzzy matches useful?
Estimativa da qualidade da tradu¸c˜ao autom´atica 12 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Framework
Quality estimation (QE): provide an estimate of
quality for new translated text *before* it is post-edited
Quality = post-editing effort
Estimativa da qualidade da tradu¸c˜ao autom´atica 13 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Framework
Quality estimation (QE): provide an estimate of
quality for new translated text *before* it is post-edited
Quality = post-editing effort
No access to reference translations: machine learning
techniques to predict post-editing effort scores
Estimativa da qualidade da tradu¸c˜ao autom´atica 13 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Framework
Quality estimation (QE): provide an estimate of
quality for new translated text *before* it is post-edited
Quality = post-editing effort
No access to reference translations: machine learning
techniques to predict post-editing effort scores
Considers interaction with TM systems: only used for
low fuzzy match cases, or to select between TM and MT
Estimativa da qualidade da tradu¸c˜ao autom´atica 13 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Framework
Quality estimation (QE): provide an estimate of
quality for new translated text *before* it is post-edited
Quality = post-editing effort
No access to reference translations: machine learning
techniques to predict post-editing effort scores
Considers interaction with TM systems: only used for
low fuzzy match cases, or to select between TM and MT
QTLaunchPad project
Multidimensional Quality Metrics for MT and HT, for manual
and (semi-)automatic evaluation (QE):
http://www.qt21.eu/launchpad/
Estimativa da qualidade da tradu¸c˜ao autom´atica 13 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Framework
QE system
Examples:
source &
translations,
quality scores
Quality
indicators
Estimativa da qualidade da tradu¸c˜ao autom´atica 14 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Framework
Source
text
MT system
Translation
QE system
Quality score
Examples:
source &
translations,
quality scores
Quality
indicators
Estimativa da qualidade da tradu¸c˜ao autom´atica 14 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Examples of positive results
Time to post-edit subset of sentences predicted as
“good” (low effort) vs time to post-edit random subset of
sentences
Estimativa da qualidade da tradu¸c˜ao autom´atica 15 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Examples of positive results
Time to post-edit subset of sentences predicted as
“good” (low effort) vs time to post-edit random subset of
sentences
Language no QE QE
fr-en 0.75 words/sec 1.09 words/sec
en-es 0.32 words/sec 0.57 words/sec
Estimativa da qualidade da tradu¸c˜ao autom´atica 15 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Examples of positive results
Time to post-edit subset of sentences predicted as
“good” (low effort) vs time to post-edit random subset of
sentences
Language no QE QE
fr-en 0.75 words/sec 1.09 words/sec
en-es 0.32 words/sec 0.57 words/sec
Accuracy in selecting best translation among 4 MT
systems
Best MT system Highest QE score
54% 77%
Estimativa da qualidade da tradu¸c˜ao autom´atica 15 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art
Quality indicators:
Source text TranslationMT system
Confidence
indicators
Complexity
indicators
Fluency
indicators
Adequacy
indicators
Estimativa da qualidade da tradu¸c˜ao autom´atica 16 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art
Quality indicators:
Source text TranslationMT system
Confidence
indicators
Complexity
indicators
Fluency
indicators
Adequacy
indicators
Learning algorithms: wide range
Estimativa da qualidade da tradu¸c˜ao autom´atica 16 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art
Quality indicators:
Source text TranslationMT system
Confidence
indicators
Complexity
indicators
Fluency
indicators
Adequacy
indicators
Learning algorithms: wide range
Datasets: few with absolute human scores (1-4/5 scores,
PE time, edit distance)
Estimativa da qualidade da tradu¸c˜ao autom´atica 16 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Outline
1 Quality of Machine Translation
2 Quality Estimation
3 Open issues
4 Conclusions
Estimativa da qualidade da tradu¸c˜ao autom´atica 17 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art indicators
Shallow indicators:
(S/T/S-T) Sentence length
(S/T) Language model
(S/T) Token-type ratio
(S) Average number of possible translations per word
(S) % of n-grams belonging to different frequency
quartiles of a source language corpus
(T) Untranslated/OOV words
(T) Mismatching brackets, quotation marks
(S-T) Preservation of punctuation
(S-T) Word alignment score, etc.
Estimativa da qualidade da tradu¸c˜ao autom´atica 18 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art indicators
Shallow indicators:
(S/T/S-T) Sentence length
(S/T) Language model
(S/T) Token-type ratio
(S) Average number of possible translations per word
(S) % of n-grams belonging to different frequency
quartiles of a source language corpus
(T) Untranslated/OOV words
(T) Mismatching brackets, quotation marks
(S-T) Preservation of punctuation
(S-T) Word alignment score, etc.
These do well for estimation post-editing effort...
...but are not enough for other aspects of quality, e.g.
adequacy
Estimativa da qualidade da tradu¸c˜ao autom´atica 18 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art indicators
Linguistic indicators - count-based:
(S/T/S-T) Content/non-content words
(S/T/S-T) Nouns/verbs/... NP/VP/...
(S/T/S-T) Deictics (references)
(S/T/S-T) Discourse markers (references)
(S/T/S-T) Named entities
(S/T/S-T) Zero-subjects
(S/T/S-T) Pronominal subjects
(S/T/S-T) Negation indicators
(T) Subject-verb / adjective-noun agreement
(T) Language Model of POS
(T) Grammar checking (dangling words)
(T) Coherence
Estimativa da qualidade da tradu¸c˜ao autom´atica 19 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art indicators
Linguistic indicators - alignment-based:
(S-T) Correct translation of pronouns
(S-T) Matching of dependency relations
(S-T) Matching of named entities
(S-T) Alignment of parse trees
(S-T) Alignment of predicates & arguments, etc.
Estimativa da qualidade da tradu¸c˜ao autom´atica 20 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art indicators
Linguistic indicators - alignment-based:
(S-T) Correct translation of pronouns
(S-T) Matching of dependency relations
(S-T) Matching of named entities
(S-T) Alignment of parse trees
(S-T) Alignment of predicates & arguments, etc.
Some indicators are language-dependent, others need
resources that are language-dependent, but apply to most
languages, e.g. LM of POS tags
Estimativa da qualidade da tradu¸c˜ao autom´atica 20 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art indicators
Fine-grained, lexicalised indicators:
target-word = “process” =
1, if source-word = “hdhh alamlyt”.
0, otherwise.
target-word = “process” =
1, if source-pos = “DT DTNN”.
0, otherwise.
Estimativa da qualidade da tradu¸c˜ao autom´atica 21 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art indicators
Fine-grained, lexicalised indicators:
target-word = “process” =
1, if source-word = “hdhh alamlyt”.
0, otherwise.
target-word = “process” =
1, if source-pos = “DT DTNN”.
0, otherwise.
Closer to error detection
Need large amounts of training data [BHAO11], or RB approaches
Estimativa da qualidade da tradu¸c˜ao autom´atica 21 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Do these indicators work?
Estimativa da qualidade da tradu¸c˜ao autom´atica 22 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Do these indicators work?
To some extent... Issues:
Representation of shallow/deep indicators: counts,
ratios, (absolute) differences?
F = S − T, F = |S − T|, F =
T
S
, F =
S − T
S
...
Estimativa da qualidade da tradu¸c˜ao autom´atica 22 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Do these indicators work?
To some extent... Issues:
Representation of shallow/deep indicators: counts,
ratios, (absolute) differences?
F = S − T, F = |S − T|, F =
T
S
, F =
S − T
S
...
Resources to extract deep indicators: availability and
reliability
Estimativa da qualidade da tradu¸c˜ao autom´atica 22 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Do these indicators work?
To some extent... Issues:
Representation of shallow/deep indicators: counts,
ratios, (absolute) differences?
F = S − T, F = |S − T|, F =
T
S
, F =
S − T
S
...
Resources to extract deep indicators: availability and
reliability
Data to extract fine-grained indicators: need previously
translated and post-edited data esp. for negative
examples
Estimativa da qualidade da tradu¸c˜ao autom´atica 22 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Manual scoring: agreement between translators
Absolute value judgements: difficult to achieve consistency
across annotators even in highly controlled setup
Estimativa da qualidade da tradu¸c˜ao autom´atica 23 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Manual scoring: agreement between translators
Absolute value judgements: difficult to achieve consistency
across annotators even in highly controlled setup
en-es news WMT12 dataset: 3 professional
translators, 1-5 scores
15% of initial dataset discarded: annotators disagreed by
more than one category
Remaining annotations had to be scaled (0.33, 0.17,
0.50)
Estimativa da qualidade da tradu¸c˜ao autom´atica 23 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Manual scoring: Agreement between translators
en-pt subtitles of TV series: 3 non-professionals
annotators, 1-4 scores
351 cases (41%): full agreement
445 cases (52%): partial agreement
54 cases (7%): null agreement
Estimativa da qualidade da tradu¸c˜ao autom´atica 24 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Manual scoring: Agreement between translators
en-pt subtitles of TV series: 3 non-professionals
annotators, 1-4 scores
351 cases (41%): full agreement
445 cases (52%): partial agreement
54 cases (7%): null agreement
Agreement by score:
Score Full
4 59%
3 35%
2 23%
1 50%
Estimativa da qualidade da tradu¸c˜ao autom´atica 24 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
More objective ways of annotating translations
HTER: Edit distance between MT output and its minimally
post-edited version
Estimativa da qualidade da tradu¸c˜ao autom´atica 25 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
More objective ways of annotating translations
HTER: Edit distance between MT output and its minimally
post-edited version
HTER =
#edits
#words postedited version
Edits: substitute, delete, insert, shift
Estimativa da qualidade da tradu¸c˜ao autom´atica 25 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
More objective ways of annotating translations
HTER: Edit distance between MT output and its minimally
post-edited version
HTER =
#edits
#words postedited version
Edits: substitute, delete, insert, shift
Analysis by Maarit Koponen (WMT-12) on post-edited
translations with HTER and 1-5 scores
A number of cases where translations with low HTER
(few edits) were assigned low quality scores (high
post-editing effort), and vice-versa
Estimativa da qualidade da tradu¸c˜ao autom´atica 25 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
More objective ways of annotating translations
HTER: Edit distance between MT output and its minimally
post-edited version
HTER =
#edits
#words postedited version
Edits: substitute, delete, insert, shift
Analysis by Maarit Koponen (WMT-12) on post-edited
translations with HTER and 1-5 scores
A number of cases where translations with low HTER
(few edits) were assigned low quality scores (high
post-editing effort), and vice-versa
Certain edits seem to require more cognitive effort than
others - not captured by HTER
Estimativa da qualidade da tradu¸c˜ao autom´atica 25 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
More objective ways of annotating translations
TIME: varies considerably across translators (expected)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0
100
200
300
400
500
600
A1
A2
A3
A4
A5
A6
A7
A8
Segments
Annotators
Seconds
Can we normalise this variation?
A dedicated QE system for each translator?
Estimativa da qualidade da tradu¸c˜ao autom´atica 26 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
More objective ways of annotating translations
TIME: varies considerably across translators (expected)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.00
5.00
10.00
15.00
20.00
25.00
A1
A2
A3
A4
A5
A6
A7
A8
Annotators
Seconds / word
Segments
Can we normalise this variation?
A dedicated QE system for each translator?
Estimativa da qualidade da tradu¸c˜ao autom´atica 26 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
More objective ways of annotating translations
Time, HTER, Keystrokes: data from 8 post-editors
Estimativa da qualidade da tradu¸c˜ao autom´atica 27 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
More objective ways of annotating translations
PET: http://pers-www.wlv.ac.uk/~in1676/pet/
Estimativa da qualidade da tradu¸c˜ao autom´atica 27 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
How to use estimated PE effort scores?
Should (supposedly) bad quality translations be filtered
out or shown to translators (different scores/colour
codes as in TMs)?
Wasting time to read scores and translations vs wasting
“gisting” information
Estimativa da qualidade da tradu¸c˜ao autom´atica 28 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
How to use estimated PE effort scores?
Should (supposedly) bad quality translations be filtered
out or shown to translators (different scores/colour
codes as in TMs)?
Wasting time to read scores and translations vs wasting
“gisting” information
How to define a threshold on the estimated translation
quality to decide what should be filtered out?
Translator dependent
Task dependent (SDL)
Estimativa da qualidade da tradu¸c˜ao autom´atica 28 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
How to use estimated PE effort scores?
Should (supposedly) bad quality translations be filtered
out or shown to translators (different scores/colour
codes as in TMs)?
Wasting time to read scores and translations vs wasting
“gisting” information
How to define a threshold on the estimated translation
quality to decide what should be filtered out?
Translator dependent
Task dependent (SDL)
Do translators prefer detailed estimates (sub-sentence
level) or an overall estimate for the complete sentence?
Too much information vs hard-to-interpret scores
Estimativa da qualidade da tradu¸c˜ao autom´atica 28 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Outline
1 Quality of Machine Translation
2 Quality Estimation
3 Open issues
4 Conclusions
Estimativa da qualidade da tradu¸c˜ao autom´atica 29 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Conclusions
It is possible to estimate at least certain aspects of MT
quality, esp. wrt PE effort: QuEst
http://quest.dcs.shef.ac.uk/
Estimativa da qualidade da tradu¸c˜ao autom´atica 30 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Conclusions
It is possible to estimate at least certain aspects of MT
quality, esp. wrt PE effort: QuEst
http://quest.dcs.shef.ac.uk/
PE effort estimates can be used in real applications
Ranking translations: filter out bad quality translations
Selecting translations from multiple MT systems
Estimativa da qualidade da tradu¸c˜ao autom´atica 30 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Conclusions
It is possible to estimate at least certain aspects of MT
quality, esp. wrt PE effort: QuEst
http://quest.dcs.shef.ac.uk/
PE effort estimates can be used in real applications
Ranking translations: filter out bad quality translations
Selecting translations from multiple MT systems
Commercial products by SDL (document-level for gisting)
and Multilizer
Estimativa da qualidade da tradu¸c˜ao autom´atica 30 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Conclusions
It is possible to estimate at least certain aspects of MT
quality, esp. wrt PE effort: QuEst
http://quest.dcs.shef.ac.uk/
PE effort estimates can be used in real applications
Ranking translations: filter out bad quality translations
Selecting translations from multiple MT systems
Commercial products by SDL (document-level for gisting)
and Multilizer
A number of open issues to be investigated...
Estimativa da qualidade da tradu¸c˜ao autom´atica 30 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Conclusions
It is possible to estimate at least certain aspects of MT
quality, esp. wrt PE effort: QuEst
http://quest.dcs.shef.ac.uk/
PE effort estimates can be used in real applications
Ranking translations: filter out bad quality translations
Selecting translations from multiple MT systems
Commercial products by SDL (document-level for gisting)
and Multilizer
A number of open issues to be investigated...
Collaboration with “human translators” essential
Estimativa da qualidade da tradu¸c˜ao autom´atica 30 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Conclusions
It is possible to estimate at least certain aspects of MT
quality, esp. wrt PE effort: QuEst
http://quest.dcs.shef.ac.uk/
PE effort estimates can be used in real applications
Ranking translations: filter out bad quality translations
Selecting translations from multiple MT systems
Commercial products by SDL (document-level for gisting)
and Multilizer
A number of open issues to be investigated...
Collaboration with “human translators” essential
My vision
Sub-sentence level QE (error detection), highlighting
errors but also given an overall estimate for the sentence
Estimativa da qualidade da tradu¸c˜ao autom´atica 30 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Estimativa da qualidade da tradu¸c˜ao
autom´atica
Lucia Specia
University of Sheffield
l.specia@sheffield.ac.uk
Faculdade de Letras da Universidade do Porto
13 May 2013
Estimativa da qualidade da tradu¸c˜ao autom´atica 31 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
Autodesk.
Translation and Post-Editing Productivity.
In http: // translate. autodesk. com/ productivity. html ,
2011.
Nguyen Bach, Fei Huang, and Yaser Al-Onaizan.
Goodness: a method for measuring machine translation confidence.
pages 211–219, Portland, Oregon, 2011.
Markus Dreyer and Daniel Marcu.
Hyter: Meaning-equivalent semantics for translation evaluation.
In Proceedings of the 2012 Conference of the North American
Chapter of the Association for Computational Linguistics: Human
Language Technologies, pages 162–171, Montr´eal, Canada, 2012.
Intel.
Being Streetwise with Machine Translation in an Enterprise
Neighborhood.
Estimativa da qualidade da tradu¸c˜ao autom´atica 31 / 31
Quality of Machine Translation Quality Estimation Open issues Conclusions
In http:
// mtmarathon2010. info/ JEC2010_ Burgett_ slides. pptx ,
2010.
Intel.
Enabling Multilingual Collaboration through Machine Translation.
In http: // media12. connectedsocialmedia. com/ intel/ 06/
8647/ Enabling_ Multilingual_ Collaboration_ Machine_
Translation. pdf , 2012.
Estimativa da qualidade da tradu¸c˜ao autom´atica 31 / 31

Contenu connexe

Tendances

Software development slides
Software development slidesSoftware development slides
Software development slides
iarthur
 
Why Isn't Clean Coding Working For My Team
Why Isn't Clean Coding Working For My TeamWhy Isn't Clean Coding Working For My Team
Why Isn't Clean Coding Working For My Team
Rob Curry
 
Problem Solving Techniques
Problem Solving TechniquesProblem Solving Techniques
Problem Solving Techniques
Ashesh R
 

Tendances (13)

Cost of defects
Cost of defectsCost of defects
Cost of defects
 
CMSC 330 QUIZ 4
CMSC 330 QUIZ 4CMSC 330 QUIZ 4
CMSC 330 QUIZ 4
 
Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)
Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)
Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)
 
New Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation TechnologyNew Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation Technology
 
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
 
Analytics and Data as a Keystone Technology for Translation Companies, Doron ...
Analytics and Data as a Keystone Technology for Translation Companies, Doron ...Analytics and Data as a Keystone Technology for Translation Companies, Doron ...
Analytics and Data as a Keystone Technology for Translation Companies, Doron ...
 
Software development slides
Software development slidesSoftware development slides
Software development slides
 
Software Design Principles and Best Practices - Satyajit Dey
Software Design Principles and Best Practices - Satyajit DeySoftware Design Principles and Best Practices - Satyajit Dey
Software Design Principles and Best Practices - Satyajit Dey
 
Quality assurance in the early stages of the product
Quality assurance in the early stages of the productQuality assurance in the early stages of the product
Quality assurance in the early stages of the product
 
Why Isn't Clean Coding Working For My Team
Why Isn't Clean Coding Working For My TeamWhy Isn't Clean Coding Working For My Team
Why Isn't Clean Coding Working For My Team
 
FiSTB - agile testing
FiSTB - agile testingFiSTB - agile testing
FiSTB - agile testing
 
Problem Solving Techniques
Problem Solving TechniquesProblem Solving Techniques
Problem Solving Techniques
 
C++ ppt
C++ pptC++ ppt
C++ ppt
 

Similaire à Lucia Specia - Estimativa de qualidade em TA

10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation
RIILP
 
Maximising Machine Translation Return on Investment (KantanMT/Medialocate)
Maximising Machine Translation Return on Investment (KantanMT/Medialocate)Maximising Machine Translation Return on Investment (KantanMT/Medialocate)
Maximising Machine Translation Return on Investment (KantanMT/Medialocate)
kantanmt
 
Translation quality measurement2
Translation quality measurement2Translation quality measurement2
Translation quality measurement2
patigalin
 
Qtp interview questions_1
Qtp interview questions_1Qtp interview questions_1
Qtp interview questions_1
Ramu Palanki
 

Similaire à Lucia Specia - Estimativa de qualidade em TA (20)

Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...
 
How Does Your MT System Measure Up? tekom/tcworld 2014
How Does Your MT System Measure Up? tekom/tcworld 2014 How Does Your MT System Measure Up? tekom/tcworld 2014
How Does Your MT System Measure Up? tekom/tcworld 2014
 
10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation
 
Miguel Vera - Unbabel - OSL19
Miguel Vera - Unbabel - OSL19Miguel Vera - Unbabel - OSL19
Miguel Vera - Unbabel - OSL19
 
Maximising Machine Translation Return on Investment (KantanMT/Medialocate)
Maximising Machine Translation Return on Investment (KantanMT/Medialocate)Maximising Machine Translation Return on Investment (KantanMT/Medialocate)
Maximising Machine Translation Return on Investment (KantanMT/Medialocate)
 
Software Quality for Developers
Software Quality for DevelopersSoftware Quality for Developers
Software Quality for Developers
 
Language Quality Management: Models, Measures, Methodologies
Language Quality Management: Models, Measures, Methodologies Language Quality Management: Models, Measures, Methodologies
Language Quality Management: Models, Measures, Methodologies
 
Evaluation of MT Quality/Productivity at eBay - AMTA 2018
Evaluation of MT Quality/Productivity at eBay - AMTA 2018Evaluation of MT Quality/Productivity at eBay - AMTA 2018
Evaluation of MT Quality/Productivity at eBay - AMTA 2018
 
Monitoring and metrics in chrome
Monitoring and metrics in chromeMonitoring and metrics in chrome
Monitoring and metrics in chrome
 
Predictive Analysis in Machine Translation is Business Intelligence.
Predictive Analysis in Machine Translation is Business Intelligence.Predictive Analysis in Machine Translation is Business Intelligence.
Predictive Analysis in Machine Translation is Business Intelligence.
 
Preparing for AI - Measurefest
Preparing for AI - MeasurefestPreparing for AI - Measurefest
Preparing for AI - Measurefest
 
Translation quality measurement2
Translation quality measurement2Translation quality measurement2
Translation quality measurement2
 
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
 
How to Improve Translation Productivity
How to Improve Translation ProductivityHow to Improve Translation Productivity
How to Improve Translation Productivity
 
Preparing for Enterprise Continuous Delivery - 5 Critical Steps
Preparing for Enterprise Continuous Delivery - 5 Critical StepsPreparing for Enterprise Continuous Delivery - 5 Critical Steps
Preparing for Enterprise Continuous Delivery - 5 Critical Steps
 
iMT Language Solutions
iMT Language SolutionsiMT Language Solutions
iMT Language Solutions
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_sa
 
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
 
Qtp interview questions_1
Qtp interview questions_1Qtp interview questions_1
Qtp interview questions_1
 
Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking Compromises
 

Plus de I Conferência Internacional de Tradução e Tecnologia

Plus de I Conferência Internacional de Tradução e Tecnologia (10)

Bernardo Santos - memoQ
Bernardo Santos - memoQBernardo Santos - memoQ
Bernardo Santos - memoQ
 
Lucia Specia - SMT e pós-edição
Lucia Specia - SMT e pós-ediçãoLucia Specia - SMT e pós-edição
Lucia Specia - SMT e pós-edição
 
Anabela Barreiro - Alinhamentos
Anabela Barreiro - AlinhamentosAnabela Barreiro - Alinhamentos
Anabela Barreiro - Alinhamentos
 
José Ramom Campos - RBMT e distâncias linguísticas
José Ramom Campos - RBMT e distâncias linguísticasJosé Ramom Campos - RBMT e distâncias linguísticas
José Ramom Campos - RBMT e distâncias linguísticas
 
Félix do Carmo e Luís Trigo - Tradutores e máquinas de tradução
Félix do Carmo e Luís Trigo - Tradutores e máquinas de traduçãoFélix do Carmo e Luís Trigo - Tradutores e máquinas de tradução
Félix do Carmo e Luís Trigo - Tradutores e máquinas de tradução
 
José Ramom Campos - Opentrad
José Ramom Campos - OpentradJosé Ramom Campos - Opentrad
José Ramom Campos - Opentrad
 
Hilário Fontes - Tradução automática na CE
Hilário Fontes - Tradução automática na CEHilário Fontes - Tradução automática na CE
Hilário Fontes - Tradução automática na CE
 
Anabela Barreiro - Hibridização de TA
Anabela Barreiro - Hibridização de TAAnabela Barreiro - Hibridização de TA
Anabela Barreiro - Hibridização de TA
 
Luísa Coheur - Projecto PT-STAR
Luísa Coheur - Projecto PT-STARLuísa Coheur - Projecto PT-STAR
Luísa Coheur - Projecto PT-STAR
 
Belinda Maia - Introdução à tradução automática
Belinda Maia - Introdução à tradução automáticaBelinda Maia - Introdução à tradução automática
Belinda Maia - Introdução à tradução automática
 

Dernier

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Dernier (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Lucia Specia - Estimativa de qualidade em TA

  • 1. Quality of Machine Translation Quality Estimation Open issues Conclusions Estimativa da qualidade da tradu¸c˜ao autom´atica Lucia Specia University of Sheffield l.specia@sheffield.ac.uk Faculdade de Letras da Universidade do Porto 13 May 2013 Estimativa da qualidade da tradu¸c˜ao autom´atica 1 / 31
  • 2. Quality of Machine Translation Quality Estimation Open issues Conclusions Outline 1 Quality of Machine Translation 2 Quality Estimation 3 Open issues 4 Conclusions Estimativa da qualidade da tradu¸c˜ao autom´atica 2 / 31
  • 3. Quality of Machine Translation Quality Estimation Open issues Conclusions Outline 1 Quality of Machine Translation 2 Quality Estimation 3 Open issues 4 Conclusions Estimativa da qualidade da tradu¸c˜ao autom´atica 3 / 31
  • 4. Quality of Machine Translation Quality Estimation Open issues Conclusions Introduction Machine Translation: Around since the early 1950s Estimativa da qualidade da tradu¸c˜ao autom´atica 4 / 31
  • 5. Quality of Machine Translation Quality Estimation Open issues Conclusions Introduction Machine Translation: Around since the early 1950s Increasingly more popular since 1990: statistical approaches Estimativa da qualidade da tradu¸c˜ao autom´atica 4 / 31
  • 6. Quality of Machine Translation Quality Estimation Open issues Conclusions Introduction Machine Translation: Around since the early 1950s Increasingly more popular since 1990: statistical approaches Software tools and data available to build translation systems - Moses and others Estimativa da qualidade da tradu¸c˜ao autom´atica 4 / 31
  • 7. Quality of Machine Translation Quality Estimation Open issues Conclusions Introduction Machine Translation: Around since the early 1950s Increasingly more popular since 1990: statistical approaches Software tools and data available to build translation systems - Moses and others Increasing demand for cheaper and fast translations Estimativa da qualidade da tradu¸c˜ao autom´atica 4 / 31
  • 8. Quality of Machine Translation Quality Estimation Open issues Conclusions Introduction Machine Translation: Around since the early 1950s Increasingly more popular since 1990: statistical approaches Software tools and data available to build translation systems - Moses and others Increasing demand for cheaper and fast translations How do we measure quality and progress over time? So far... mostly automatic evaluation metrics Estimativa da qualidade da tradu¸c˜ao autom´atica 4 / 31
  • 9. Quality of Machine Translation Quality Estimation Open issues Conclusions MT evaluation metrics N-gram matching between system output and one or more reference translations: BLEU and many others Estimativa da qualidade da tradu¸c˜ao autom´atica 5 / 31
  • 10. Quality of Machine Translation Quality Estimation Open issues Conclusions MT evaluation metrics N-gram matching between system output and one or more reference translations: BLEU and many others Issue 1: Too many possible good quality translations, need thousands of references to capture valid variations Estimativa da qualidade da tradu¸c˜ao autom´atica 5 / 31
  • 11. Quality of Machine Translation Quality Estimation Open issues Conclusions MT evaluation metrics N-gram matching between system output and one or more reference translations: BLEU and many others Issue 1: Too many possible good quality translations, need thousands of references to capture valid variations Solution: HyTER (Language Weaver) annotation tool to generate all possible correct translations! [DM12] Translations built bottom-up from word/phrase translation equivalents using FSA 2-2.5 hours worth of expert annotation per sentence One annotator: 5.2 × 106 paths A bunch of annotators: 8.5 × 1011 paths Estimativa da qualidade da tradu¸c˜ao autom´atica 5 / 31
  • 12. Quality of Machine Translation Quality Estimation Open issues Conclusions MT evaluation metrics Issue 2: Difficult to quantify severity of mismatching n-grams Estimativa da qualidade da tradu¸c˜ao autom´atica 6 / 31
  • 13. Quality of Machine Translation Quality Estimation Open issues Conclusions MT evaluation metrics Issue 2: Difficult to quantify severity of mismatching n-grams ref Do not buy this product, it’s their craziest invention! sys Do buy this product, it’s their craziest invention! Estimativa da qualidade da tradu¸c˜ao autom´atica 6 / 31
  • 14. Quality of Machine Translation Quality Estimation Open issues Conclusions MT evaluation metrics Issue 2: Difficult to quantify severity of mismatching n-grams ref Do not buy this product, it’s their craziest invention! sys Do buy this product, it’s their craziest invention! Some attempts to weight mismatches differently - sparse, lexicalised approach Estimativa da qualidade da tradu¸c˜ao autom´atica 6 / 31
  • 15. Quality of Machine Translation Quality Estimation Open issues Conclusions MT evaluation metrics Issue 2: Difficult to quantify severity of mismatching n-grams ref Do not buy this product, it’s their craziest invention! sys Do buy this product, it’s their craziest invention! Some attempts to weight mismatches differently - sparse, lexicalised approach However, same error is more or less important depending on the user or purpose: Severe if end-user does not speak source language Trivial to post-edit by translators Estimativa da qualidade da tradu¸c˜ao autom´atica 6 / 31
  • 16. Quality of Machine Translation Quality Estimation Open issues Conclusions MT evaluation metrics Conversely: ref The battery lasts 6 hours and it can be fully recharged in 30 minutes. sys Six-hours battery, 30 minutes to full charge last. Estimativa da qualidade da tradu¸c˜ao autom´atica 7 / 31
  • 17. Quality of Machine Translation Quality Estimation Open issues Conclusions MT evaluation metrics Conversely: ref The battery lasts 6 hours and it can be fully recharged in 30 minutes. sys Six-hours battery, 30 minutes to full charge last. Ok for gisting - meaning preserved Very costly for post-editing if style is to be preserved Estimativa da qualidade da tradu¸c˜ao autom´atica 7 / 31
  • 18. Quality of Machine Translation Quality Estimation Open issues Conclusions Task-based evaluation Measure translation quality within task. E.g. Autodesk - Productivity test through post-editing [Aut11] 2-day translation and post-editing , 37 participants In-house Moses (Autodesk data: software) Time spent on each segment Estimativa da qualidade da tradu¸c˜ao autom´atica 8 / 31
  • 19. Quality of Machine Translation Quality Estimation Open issues Conclusions Task-based evaluation E.g.: Intel - User satisfaction with un-edited MT Translation is good if customer can solve problem Estimativa da qualidade da tradu¸c˜ao autom´atica 9 / 31
  • 20. Quality of Machine Translation Quality Estimation Open issues Conclusions Task-based evaluation E.g.: Intel - User satisfaction with un-edited MT Translation is good if customer can solve problem MT for Customer Support websites [Int10] Overall customer satisfaction: 75% for English→Chinese Estimativa da qualidade da tradu¸c˜ao autom´atica 9 / 31
  • 21. Quality of Machine Translation Quality Estimation Open issues Conclusions Task-based evaluation E.g.: Intel - User satisfaction with un-edited MT Translation is good if customer can solve problem MT for Customer Support websites [Int10] Overall customer satisfaction: 75% for English→Chinese 95% reduction in cost Project cycle from 10 days to 1 day From 300 to 60,000 words translated/hour Estimativa da qualidade da tradu¸c˜ao autom´atica 9 / 31
  • 22. Quality of Machine Translation Quality Estimation Open issues Conclusions Task-based evaluation E.g.: Intel - User satisfaction with un-edited MT Translation is good if customer can solve problem MT for Customer Support websites [Int10] Overall customer satisfaction: 75% for English→Chinese 95% reduction in cost Project cycle from 10 days to 1 day From 300 to 60,000 words translated/hour Customers in China using MT texts were more satisfied with support than natives using original texts (68%)! Estimativa da qualidade da tradu¸c˜ao autom´atica 9 / 31
  • 23. Quality of Machine Translation Quality Estimation Open issues Conclusions Task-based evaluation E.g.: Intel - User satisfaction with un-edited MT Translation is good if customer can solve problem MT for Customer Support websites [Int10] Overall customer satisfaction: 75% for English→Chinese 95% reduction in cost Project cycle from 10 days to 1 day From 300 to 60,000 words translated/hour Customers in China using MT texts were more satisfied with support than natives using original texts (68%)! MT for chat and community forums [Int12] ∼60% “understandable and actionable” (→English/Spanish) Max ∼10% “not understandable” (→Chinese) Estimativa da qualidade da tradu¸c˜ao autom´atica 9 / 31
  • 24. Quality of Machine Translation Quality Estimation Open issues Conclusions Outline 1 Quality of Machine Translation 2 Quality Estimation 3 Open issues 4 Conclusions Estimativa da qualidade da tradu¸c˜ao autom´atica 10 / 31
  • 25. Quality of Machine Translation Quality Estimation Open issues Conclusions Overview Metrics either depend on references or post-editing/use of translations (task-based) Estimativa da qualidade da tradu¸c˜ao autom´atica 11 / 31
  • 26. Quality of Machine Translation Quality Estimation Open issues Conclusions Overview Metrics either depend on references or post-editing/use of translations (task-based) Our proposal Quality assessment without reference, prior to post-editing/use of translations Estimativa da qualidade da tradu¸c˜ao autom´atica 11 / 31
  • 27. Quality of Machine Translation Quality Estimation Open issues Conclusions Overview Why don’t translators use (more) MT? Estimativa da qualidade da tradu¸c˜ao autom´atica 12 / 31
  • 28. Quality of Machine Translation Quality Estimation Open issues Conclusions Overview Why don’t translators use (more) MT? Translations are not good enough! Estimativa da qualidade da tradu¸c˜ao autom´atica 12 / 31
  • 29. Quality of Machine Translation Quality Estimation Open issues Conclusions Overview Why don’t translators use (more) MT? Translations are not good enough! What about TMs? Aren’t fuzzy matches useful? Estimativa da qualidade da tradu¸c˜ao autom´atica 12 / 31
  • 30. Quality of Machine Translation Quality Estimation Open issues Conclusions Overview Why don’t translators use (more) MT? Translations are not good enough! What about TMs? Aren’t fuzzy matches useful? Estimativa da qualidade da tradu¸c˜ao autom´atica 12 / 31
  • 31. Quality of Machine Translation Quality Estimation Open issues Conclusions Framework Quality estimation (QE): provide an estimate of quality for new translated text *before* it is post-edited Quality = post-editing effort Estimativa da qualidade da tradu¸c˜ao autom´atica 13 / 31
  • 32. Quality of Machine Translation Quality Estimation Open issues Conclusions Framework Quality estimation (QE): provide an estimate of quality for new translated text *before* it is post-edited Quality = post-editing effort No access to reference translations: machine learning techniques to predict post-editing effort scores Estimativa da qualidade da tradu¸c˜ao autom´atica 13 / 31
  • 33. Quality of Machine Translation Quality Estimation Open issues Conclusions Framework Quality estimation (QE): provide an estimate of quality for new translated text *before* it is post-edited Quality = post-editing effort No access to reference translations: machine learning techniques to predict post-editing effort scores Considers interaction with TM systems: only used for low fuzzy match cases, or to select between TM and MT Estimativa da qualidade da tradu¸c˜ao autom´atica 13 / 31
  • 34. Quality of Machine Translation Quality Estimation Open issues Conclusions Framework Quality estimation (QE): provide an estimate of quality for new translated text *before* it is post-edited Quality = post-editing effort No access to reference translations: machine learning techniques to predict post-editing effort scores Considers interaction with TM systems: only used for low fuzzy match cases, or to select between TM and MT QTLaunchPad project Multidimensional Quality Metrics for MT and HT, for manual and (semi-)automatic evaluation (QE): http://www.qt21.eu/launchpad/ Estimativa da qualidade da tradu¸c˜ao autom´atica 13 / 31
  • 35. Quality of Machine Translation Quality Estimation Open issues Conclusions Framework QE system Examples: source & translations, quality scores Quality indicators Estimativa da qualidade da tradu¸c˜ao autom´atica 14 / 31
  • 36. Quality of Machine Translation Quality Estimation Open issues Conclusions Framework Source text MT system Translation QE system Quality score Examples: source & translations, quality scores Quality indicators Estimativa da qualidade da tradu¸c˜ao autom´atica 14 / 31
  • 37. Quality of Machine Translation Quality Estimation Open issues Conclusions Examples of positive results Time to post-edit subset of sentences predicted as “good” (low effort) vs time to post-edit random subset of sentences Estimativa da qualidade da tradu¸c˜ao autom´atica 15 / 31
  • 38. Quality of Machine Translation Quality Estimation Open issues Conclusions Examples of positive results Time to post-edit subset of sentences predicted as “good” (low effort) vs time to post-edit random subset of sentences Language no QE QE fr-en 0.75 words/sec 1.09 words/sec en-es 0.32 words/sec 0.57 words/sec Estimativa da qualidade da tradu¸c˜ao autom´atica 15 / 31
  • 39. Quality of Machine Translation Quality Estimation Open issues Conclusions Examples of positive results Time to post-edit subset of sentences predicted as “good” (low effort) vs time to post-edit random subset of sentences Language no QE QE fr-en 0.75 words/sec 1.09 words/sec en-es 0.32 words/sec 0.57 words/sec Accuracy in selecting best translation among 4 MT systems Best MT system Highest QE score 54% 77% Estimativa da qualidade da tradu¸c˜ao autom´atica 15 / 31
  • 40. Quality of Machine Translation Quality Estimation Open issues Conclusions State-of-the-art Quality indicators: Source text TranslationMT system Confidence indicators Complexity indicators Fluency indicators Adequacy indicators Estimativa da qualidade da tradu¸c˜ao autom´atica 16 / 31
  • 41. Quality of Machine Translation Quality Estimation Open issues Conclusions State-of-the-art Quality indicators: Source text TranslationMT system Confidence indicators Complexity indicators Fluency indicators Adequacy indicators Learning algorithms: wide range Estimativa da qualidade da tradu¸c˜ao autom´atica 16 / 31
  • 42. Quality of Machine Translation Quality Estimation Open issues Conclusions State-of-the-art Quality indicators: Source text TranslationMT system Confidence indicators Complexity indicators Fluency indicators Adequacy indicators Learning algorithms: wide range Datasets: few with absolute human scores (1-4/5 scores, PE time, edit distance) Estimativa da qualidade da tradu¸c˜ao autom´atica 16 / 31
  • 43. Quality of Machine Translation Quality Estimation Open issues Conclusions Outline 1 Quality of Machine Translation 2 Quality Estimation 3 Open issues 4 Conclusions Estimativa da qualidade da tradu¸c˜ao autom´atica 17 / 31
  • 44. Quality of Machine Translation Quality Estimation Open issues Conclusions State-of-the-art indicators Shallow indicators: (S/T/S-T) Sentence length (S/T) Language model (S/T) Token-type ratio (S) Average number of possible translations per word (S) % of n-grams belonging to different frequency quartiles of a source language corpus (T) Untranslated/OOV words (T) Mismatching brackets, quotation marks (S-T) Preservation of punctuation (S-T) Word alignment score, etc. Estimativa da qualidade da tradu¸c˜ao autom´atica 18 / 31
  • 45. Quality of Machine Translation Quality Estimation Open issues Conclusions State-of-the-art indicators Shallow indicators: (S/T/S-T) Sentence length (S/T) Language model (S/T) Token-type ratio (S) Average number of possible translations per word (S) % of n-grams belonging to different frequency quartiles of a source language corpus (T) Untranslated/OOV words (T) Mismatching brackets, quotation marks (S-T) Preservation of punctuation (S-T) Word alignment score, etc. These do well for estimation post-editing effort... ...but are not enough for other aspects of quality, e.g. adequacy Estimativa da qualidade da tradu¸c˜ao autom´atica 18 / 31
  • 46. Quality of Machine Translation Quality Estimation Open issues Conclusions State-of-the-art indicators Linguistic indicators - count-based: (S/T/S-T) Content/non-content words (S/T/S-T) Nouns/verbs/... NP/VP/... (S/T/S-T) Deictics (references) (S/T/S-T) Discourse markers (references) (S/T/S-T) Named entities (S/T/S-T) Zero-subjects (S/T/S-T) Pronominal subjects (S/T/S-T) Negation indicators (T) Subject-verb / adjective-noun agreement (T) Language Model of POS (T) Grammar checking (dangling words) (T) Coherence Estimativa da qualidade da tradu¸c˜ao autom´atica 19 / 31
  • 47. Quality of Machine Translation Quality Estimation Open issues Conclusions State-of-the-art indicators Linguistic indicators - alignment-based: (S-T) Correct translation of pronouns (S-T) Matching of dependency relations (S-T) Matching of named entities (S-T) Alignment of parse trees (S-T) Alignment of predicates & arguments, etc. Estimativa da qualidade da tradu¸c˜ao autom´atica 20 / 31
  • 48. Quality of Machine Translation Quality Estimation Open issues Conclusions State-of-the-art indicators Linguistic indicators - alignment-based: (S-T) Correct translation of pronouns (S-T) Matching of dependency relations (S-T) Matching of named entities (S-T) Alignment of parse trees (S-T) Alignment of predicates & arguments, etc. Some indicators are language-dependent, others need resources that are language-dependent, but apply to most languages, e.g. LM of POS tags Estimativa da qualidade da tradu¸c˜ao autom´atica 20 / 31
  • 49. Quality of Machine Translation Quality Estimation Open issues Conclusions State-of-the-art indicators Fine-grained, lexicalised indicators: target-word = “process” = 1, if source-word = “hdhh alamlyt”. 0, otherwise. target-word = “process” = 1, if source-pos = “DT DTNN”. 0, otherwise. Estimativa da qualidade da tradu¸c˜ao autom´atica 21 / 31
  • 50. Quality of Machine Translation Quality Estimation Open issues Conclusions State-of-the-art indicators Fine-grained, lexicalised indicators: target-word = “process” = 1, if source-word = “hdhh alamlyt”. 0, otherwise. target-word = “process” = 1, if source-pos = “DT DTNN”. 0, otherwise. Closer to error detection Need large amounts of training data [BHAO11], or RB approaches Estimativa da qualidade da tradu¸c˜ao autom´atica 21 / 31
  • 51. Quality of Machine Translation Quality Estimation Open issues Conclusions Do these indicators work? Estimativa da qualidade da tradu¸c˜ao autom´atica 22 / 31
  • 52. Quality of Machine Translation Quality Estimation Open issues Conclusions Do these indicators work? To some extent... Issues: Representation of shallow/deep indicators: counts, ratios, (absolute) differences? F = S − T, F = |S − T|, F = T S , F = S − T S ... Estimativa da qualidade da tradu¸c˜ao autom´atica 22 / 31
  • 53. Quality of Machine Translation Quality Estimation Open issues Conclusions Do these indicators work? To some extent... Issues: Representation of shallow/deep indicators: counts, ratios, (absolute) differences? F = S − T, F = |S − T|, F = T S , F = S − T S ... Resources to extract deep indicators: availability and reliability Estimativa da qualidade da tradu¸c˜ao autom´atica 22 / 31
  • 54. Quality of Machine Translation Quality Estimation Open issues Conclusions Do these indicators work? To some extent... Issues: Representation of shallow/deep indicators: counts, ratios, (absolute) differences? F = S − T, F = |S − T|, F = T S , F = S − T S ... Resources to extract deep indicators: availability and reliability Data to extract fine-grained indicators: need previously translated and post-edited data esp. for negative examples Estimativa da qualidade da tradu¸c˜ao autom´atica 22 / 31
  • 55. Quality of Machine Translation Quality Estimation Open issues Conclusions Manual scoring: agreement between translators Absolute value judgements: difficult to achieve consistency across annotators even in highly controlled setup Estimativa da qualidade da tradu¸c˜ao autom´atica 23 / 31
  • 56. Quality of Machine Translation Quality Estimation Open issues Conclusions Manual scoring: agreement between translators Absolute value judgements: difficult to achieve consistency across annotators even in highly controlled setup en-es news WMT12 dataset: 3 professional translators, 1-5 scores 15% of initial dataset discarded: annotators disagreed by more than one category Remaining annotations had to be scaled (0.33, 0.17, 0.50) Estimativa da qualidade da tradu¸c˜ao autom´atica 23 / 31
  • 57. Quality of Machine Translation Quality Estimation Open issues Conclusions Manual scoring: Agreement between translators en-pt subtitles of TV series: 3 non-professionals annotators, 1-4 scores 351 cases (41%): full agreement 445 cases (52%): partial agreement 54 cases (7%): null agreement Estimativa da qualidade da tradu¸c˜ao autom´atica 24 / 31
  • 58. Quality of Machine Translation Quality Estimation Open issues Conclusions Manual scoring: Agreement between translators en-pt subtitles of TV series: 3 non-professionals annotators, 1-4 scores 351 cases (41%): full agreement 445 cases (52%): partial agreement 54 cases (7%): null agreement Agreement by score: Score Full 4 59% 3 35% 2 23% 1 50% Estimativa da qualidade da tradu¸c˜ao autom´atica 24 / 31
  • 59. Quality of Machine Translation Quality Estimation Open issues Conclusions More objective ways of annotating translations HTER: Edit distance between MT output and its minimally post-edited version Estimativa da qualidade da tradu¸c˜ao autom´atica 25 / 31
  • 60. Quality of Machine Translation Quality Estimation Open issues Conclusions More objective ways of annotating translations HTER: Edit distance between MT output and its minimally post-edited version HTER = #edits #words postedited version Edits: substitute, delete, insert, shift Estimativa da qualidade da tradu¸c˜ao autom´atica 25 / 31
  • 61. Quality of Machine Translation Quality Estimation Open issues Conclusions More objective ways of annotating translations HTER: Edit distance between MT output and its minimally post-edited version HTER = #edits #words postedited version Edits: substitute, delete, insert, shift Analysis by Maarit Koponen (WMT-12) on post-edited translations with HTER and 1-5 scores A number of cases where translations with low HTER (few edits) were assigned low quality scores (high post-editing effort), and vice-versa Estimativa da qualidade da tradu¸c˜ao autom´atica 25 / 31
  • 62. Quality of Machine Translation Quality Estimation Open issues Conclusions More objective ways of annotating translations HTER: Edit distance between MT output and its minimally post-edited version HTER = #edits #words postedited version Edits: substitute, delete, insert, shift Analysis by Maarit Koponen (WMT-12) on post-edited translations with HTER and 1-5 scores A number of cases where translations with low HTER (few edits) were assigned low quality scores (high post-editing effort), and vice-versa Certain edits seem to require more cognitive effort than others - not captured by HTER Estimativa da qualidade da tradu¸c˜ao autom´atica 25 / 31
  • 63. Quality of Machine Translation Quality Estimation Open issues Conclusions More objective ways of annotating translations TIME: varies considerably across translators (expected) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 100 200 300 400 500 600 A1 A2 A3 A4 A5 A6 A7 A8 Segments Annotators Seconds Can we normalise this variation? A dedicated QE system for each translator? Estimativa da qualidade da tradu¸c˜ao autom´atica 26 / 31
  • 64. Quality of Machine Translation Quality Estimation Open issues Conclusions More objective ways of annotating translations TIME: varies considerably across translators (expected) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.00 5.00 10.00 15.00 20.00 25.00 A1 A2 A3 A4 A5 A6 A7 A8 Annotators Seconds / word Segments Can we normalise this variation? A dedicated QE system for each translator? Estimativa da qualidade da tradu¸c˜ao autom´atica 26 / 31
  • 65. Quality of Machine Translation Quality Estimation Open issues Conclusions More objective ways of annotating translations Time, HTER, Keystrokes: data from 8 post-editors Estimativa da qualidade da tradu¸c˜ao autom´atica 27 / 31
  • 66. Quality of Machine Translation Quality Estimation Open issues Conclusions More objective ways of annotating translations PET: http://pers-www.wlv.ac.uk/~in1676/pet/ Estimativa da qualidade da tradu¸c˜ao autom´atica 27 / 31
  • 67. Quality of Machine Translation Quality Estimation Open issues Conclusions How to use estimated PE effort scores? Should (supposedly) bad quality translations be filtered out or shown to translators (different scores/colour codes as in TMs)? Wasting time to read scores and translations vs wasting “gisting” information Estimativa da qualidade da tradu¸c˜ao autom´atica 28 / 31
  • 68. Quality of Machine Translation Quality Estimation Open issues Conclusions How to use estimated PE effort scores? Should (supposedly) bad quality translations be filtered out or shown to translators (different scores/colour codes as in TMs)? Wasting time to read scores and translations vs wasting “gisting” information How to define a threshold on the estimated translation quality to decide what should be filtered out? Translator dependent Task dependent (SDL) Estimativa da qualidade da tradu¸c˜ao autom´atica 28 / 31
  • 69. Quality of Machine Translation Quality Estimation Open issues Conclusions How to use estimated PE effort scores? Should (supposedly) bad quality translations be filtered out or shown to translators (different scores/colour codes as in TMs)? Wasting time to read scores and translations vs wasting “gisting” information How to define a threshold on the estimated translation quality to decide what should be filtered out? Translator dependent Task dependent (SDL) Do translators prefer detailed estimates (sub-sentence level) or an overall estimate for the complete sentence? Too much information vs hard-to-interpret scores Estimativa da qualidade da tradu¸c˜ao autom´atica 28 / 31
  • 70. Quality of Machine Translation Quality Estimation Open issues Conclusions Outline 1 Quality of Machine Translation 2 Quality Estimation 3 Open issues 4 Conclusions Estimativa da qualidade da tradu¸c˜ao autom´atica 29 / 31
  • 71. Quality of Machine Translation Quality Estimation Open issues Conclusions Conclusions It is possible to estimate at least certain aspects of MT quality, esp. wrt PE effort: QuEst http://quest.dcs.shef.ac.uk/ Estimativa da qualidade da tradu¸c˜ao autom´atica 30 / 31
  • 72. Quality of Machine Translation Quality Estimation Open issues Conclusions Conclusions It is possible to estimate at least certain aspects of MT quality, esp. wrt PE effort: QuEst http://quest.dcs.shef.ac.uk/ PE effort estimates can be used in real applications Ranking translations: filter out bad quality translations Selecting translations from multiple MT systems Estimativa da qualidade da tradu¸c˜ao autom´atica 30 / 31
  • 73. Quality of Machine Translation Quality Estimation Open issues Conclusions Conclusions It is possible to estimate at least certain aspects of MT quality, esp. wrt PE effort: QuEst http://quest.dcs.shef.ac.uk/ PE effort estimates can be used in real applications Ranking translations: filter out bad quality translations Selecting translations from multiple MT systems Commercial products by SDL (document-level for gisting) and Multilizer Estimativa da qualidade da tradu¸c˜ao autom´atica 30 / 31
  • 74. Quality of Machine Translation Quality Estimation Open issues Conclusions Conclusions It is possible to estimate at least certain aspects of MT quality, esp. wrt PE effort: QuEst http://quest.dcs.shef.ac.uk/ PE effort estimates can be used in real applications Ranking translations: filter out bad quality translations Selecting translations from multiple MT systems Commercial products by SDL (document-level for gisting) and Multilizer A number of open issues to be investigated... Estimativa da qualidade da tradu¸c˜ao autom´atica 30 / 31
  • 75. Quality of Machine Translation Quality Estimation Open issues Conclusions Conclusions It is possible to estimate at least certain aspects of MT quality, esp. wrt PE effort: QuEst http://quest.dcs.shef.ac.uk/ PE effort estimates can be used in real applications Ranking translations: filter out bad quality translations Selecting translations from multiple MT systems Commercial products by SDL (document-level for gisting) and Multilizer A number of open issues to be investigated... Collaboration with “human translators” essential Estimativa da qualidade da tradu¸c˜ao autom´atica 30 / 31
  • 76. Quality of Machine Translation Quality Estimation Open issues Conclusions Conclusions It is possible to estimate at least certain aspects of MT quality, esp. wrt PE effort: QuEst http://quest.dcs.shef.ac.uk/ PE effort estimates can be used in real applications Ranking translations: filter out bad quality translations Selecting translations from multiple MT systems Commercial products by SDL (document-level for gisting) and Multilizer A number of open issues to be investigated... Collaboration with “human translators” essential My vision Sub-sentence level QE (error detection), highlighting errors but also given an overall estimate for the sentence Estimativa da qualidade da tradu¸c˜ao autom´atica 30 / 31
  • 77. Quality of Machine Translation Quality Estimation Open issues Conclusions Estimativa da qualidade da tradu¸c˜ao autom´atica Lucia Specia University of Sheffield l.specia@sheffield.ac.uk Faculdade de Letras da Universidade do Porto 13 May 2013 Estimativa da qualidade da tradu¸c˜ao autom´atica 31 / 31
  • 78. Quality of Machine Translation Quality Estimation Open issues Conclusions Autodesk. Translation and Post-Editing Productivity. In http: // translate. autodesk. com/ productivity. html , 2011. Nguyen Bach, Fei Huang, and Yaser Al-Onaizan. Goodness: a method for measuring machine translation confidence. pages 211–219, Portland, Oregon, 2011. Markus Dreyer and Daniel Marcu. Hyter: Meaning-equivalent semantics for translation evaluation. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 162–171, Montr´eal, Canada, 2012. Intel. Being Streetwise with Machine Translation in an Enterprise Neighborhood. Estimativa da qualidade da tradu¸c˜ao autom´atica 31 / 31
  • 79. Quality of Machine Translation Quality Estimation Open issues Conclusions In http: // mtmarathon2010. info/ JEC2010_ Burgett_ slides. pptx , 2010. Intel. Enabling Multilingual Collaboration through Machine Translation. In http: // media12. connectedsocialmedia. com/ intel/ 06/ 8647/ Enabling_ Multilingual_ Collaboration_ Machine_ Translation. pdf , 2012. Estimativa da qualidade da tradu¸c˜ao autom´atica 31 / 31