The document discusses machine translation quality estimation. It begins by outlining issues with current automatic evaluation metrics for machine translation, such as their dependence on reference translations and inability to account for the severity of errors. It then introduces the concept of quality estimation, which aims to predict translation quality before post-editing by using machine learning on examples of source texts paired with automatic translations and human-assigned quality scores. Examples are given showing quality estimation can help prioritize sentences for post-editing and select the highest quality translation from multiple systems. The state-of-the-art in quality estimation is described as using a variety of linguistic features and learning algorithms, though available datasets with human quality judgments are limited.
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Lucia Specia - Estimativa de qualidade em TA
1. Quality of Machine Translation Quality Estimation Open issues Conclusions
Estimativa da qualidade da tradu¸c˜ao
autom´atica
Lucia Specia
University of Sheffield
l.specia@sheffield.ac.uk
Faculdade de Letras da Universidade do Porto
13 May 2013
Estimativa da qualidade da tradu¸c˜ao autom´atica 1 / 31
2. Quality of Machine Translation Quality Estimation Open issues Conclusions
Outline
1 Quality of Machine Translation
2 Quality Estimation
3 Open issues
4 Conclusions
Estimativa da qualidade da tradu¸c˜ao autom´atica 2 / 31
3. Quality of Machine Translation Quality Estimation Open issues Conclusions
Outline
1 Quality of Machine Translation
2 Quality Estimation
3 Open issues
4 Conclusions
Estimativa da qualidade da tradu¸c˜ao autom´atica 3 / 31
4. Quality of Machine Translation Quality Estimation Open issues Conclusions
Introduction
Machine Translation:
Around since the early 1950s
Estimativa da qualidade da tradu¸c˜ao autom´atica 4 / 31
5. Quality of Machine Translation Quality Estimation Open issues Conclusions
Introduction
Machine Translation:
Around since the early 1950s
Increasingly more popular since 1990: statistical
approaches
Estimativa da qualidade da tradu¸c˜ao autom´atica 4 / 31
6. Quality of Machine Translation Quality Estimation Open issues Conclusions
Introduction
Machine Translation:
Around since the early 1950s
Increasingly more popular since 1990: statistical
approaches
Software tools and data available to build translation
systems - Moses and others
Estimativa da qualidade da tradu¸c˜ao autom´atica 4 / 31
7. Quality of Machine Translation Quality Estimation Open issues Conclusions
Introduction
Machine Translation:
Around since the early 1950s
Increasingly more popular since 1990: statistical
approaches
Software tools and data available to build translation
systems - Moses and others
Increasing demand for cheaper and fast translations
Estimativa da qualidade da tradu¸c˜ao autom´atica 4 / 31
8. Quality of Machine Translation Quality Estimation Open issues Conclusions
Introduction
Machine Translation:
Around since the early 1950s
Increasingly more popular since 1990: statistical
approaches
Software tools and data available to build translation
systems - Moses and others
Increasing demand for cheaper and fast translations
How do we measure quality and progress over time?
So far... mostly automatic evaluation metrics
Estimativa da qualidade da tradu¸c˜ao autom´atica 4 / 31
9. Quality of Machine Translation Quality Estimation Open issues Conclusions
MT evaluation metrics
N-gram matching between system output and one or
more reference translations: BLEU and many others
Estimativa da qualidade da tradu¸c˜ao autom´atica 5 / 31
10. Quality of Machine Translation Quality Estimation Open issues Conclusions
MT evaluation metrics
N-gram matching between system output and one or
more reference translations: BLEU and many others
Issue 1: Too many possible good quality translations,
need thousands of references to capture valid variations
Estimativa da qualidade da tradu¸c˜ao autom´atica 5 / 31
11. Quality of Machine Translation Quality Estimation Open issues Conclusions
MT evaluation metrics
N-gram matching between system output and one or
more reference translations: BLEU and many others
Issue 1: Too many possible good quality translations,
need thousands of references to capture valid variations
Solution: HyTER (Language Weaver) annotation tool to
generate all possible correct translations! [DM12]
Translations built bottom-up from word/phrase
translation equivalents using FSA
2-2.5 hours worth of expert annotation per sentence
One annotator: 5.2 × 106 paths
A bunch of annotators: 8.5 × 1011 paths
Estimativa da qualidade da tradu¸c˜ao autom´atica 5 / 31
12. Quality of Machine Translation Quality Estimation Open issues Conclusions
MT evaluation metrics
Issue 2: Difficult to quantify severity of mismatching
n-grams
Estimativa da qualidade da tradu¸c˜ao autom´atica 6 / 31
13. Quality of Machine Translation Quality Estimation Open issues Conclusions
MT evaluation metrics
Issue 2: Difficult to quantify severity of mismatching
n-grams
ref Do not buy this product, it’s their craziest invention!
sys Do buy this product, it’s their craziest invention!
Estimativa da qualidade da tradu¸c˜ao autom´atica 6 / 31
14. Quality of Machine Translation Quality Estimation Open issues Conclusions
MT evaluation metrics
Issue 2: Difficult to quantify severity of mismatching
n-grams
ref Do not buy this product, it’s their craziest invention!
sys Do buy this product, it’s their craziest invention!
Some attempts to weight mismatches differently -
sparse, lexicalised approach
Estimativa da qualidade da tradu¸c˜ao autom´atica 6 / 31
15. Quality of Machine Translation Quality Estimation Open issues Conclusions
MT evaluation metrics
Issue 2: Difficult to quantify severity of mismatching
n-grams
ref Do not buy this product, it’s their craziest invention!
sys Do buy this product, it’s their craziest invention!
Some attempts to weight mismatches differently -
sparse, lexicalised approach
However, same error is more or less important depending
on the user or purpose:
Severe if end-user does not speak source language
Trivial to post-edit by translators
Estimativa da qualidade da tradu¸c˜ao autom´atica 6 / 31
16. Quality of Machine Translation Quality Estimation Open issues Conclusions
MT evaluation metrics
Conversely:
ref The battery lasts 6 hours and it can be fully recharged
in 30 minutes.
sys Six-hours battery, 30 minutes to full charge last.
Estimativa da qualidade da tradu¸c˜ao autom´atica 7 / 31
17. Quality of Machine Translation Quality Estimation Open issues Conclusions
MT evaluation metrics
Conversely:
ref The battery lasts 6 hours and it can be fully recharged
in 30 minutes.
sys Six-hours battery, 30 minutes to full charge last.
Ok for gisting - meaning preserved
Very costly for post-editing if style is to be preserved
Estimativa da qualidade da tradu¸c˜ao autom´atica 7 / 31
18. Quality of Machine Translation Quality Estimation Open issues Conclusions
Task-based evaluation
Measure translation quality within task. E.g. Autodesk -
Productivity test through post-editing [Aut11]
2-day translation and post-editing , 37 participants
In-house Moses (Autodesk data: software)
Time spent on each segment
Estimativa da qualidade da tradu¸c˜ao autom´atica 8 / 31
19. Quality of Machine Translation Quality Estimation Open issues Conclusions
Task-based evaluation
E.g.: Intel - User satisfaction with un-edited MT
Translation is good if customer can solve problem
Estimativa da qualidade da tradu¸c˜ao autom´atica 9 / 31
20. Quality of Machine Translation Quality Estimation Open issues Conclusions
Task-based evaluation
E.g.: Intel - User satisfaction with un-edited MT
Translation is good if customer can solve problem
MT for Customer Support websites [Int10]
Overall customer satisfaction: 75% for English→Chinese
Estimativa da qualidade da tradu¸c˜ao autom´atica 9 / 31
21. Quality of Machine Translation Quality Estimation Open issues Conclusions
Task-based evaluation
E.g.: Intel - User satisfaction with un-edited MT
Translation is good if customer can solve problem
MT for Customer Support websites [Int10]
Overall customer satisfaction: 75% for English→Chinese
95% reduction in cost
Project cycle from 10 days to 1 day
From 300 to 60,000 words translated/hour
Estimativa da qualidade da tradu¸c˜ao autom´atica 9 / 31
22. Quality of Machine Translation Quality Estimation Open issues Conclusions
Task-based evaluation
E.g.: Intel - User satisfaction with un-edited MT
Translation is good if customer can solve problem
MT for Customer Support websites [Int10]
Overall customer satisfaction: 75% for English→Chinese
95% reduction in cost
Project cycle from 10 days to 1 day
From 300 to 60,000 words translated/hour
Customers in China using MT texts were more satisfied
with support than natives using original texts (68%)!
Estimativa da qualidade da tradu¸c˜ao autom´atica 9 / 31
23. Quality of Machine Translation Quality Estimation Open issues Conclusions
Task-based evaluation
E.g.: Intel - User satisfaction with un-edited MT
Translation is good if customer can solve problem
MT for Customer Support websites [Int10]
Overall customer satisfaction: 75% for English→Chinese
95% reduction in cost
Project cycle from 10 days to 1 day
From 300 to 60,000 words translated/hour
Customers in China using MT texts were more satisfied
with support than natives using original texts (68%)!
MT for chat and community forums [Int12]
∼60% “understandable and actionable”
(→English/Spanish)
Max ∼10% “not understandable”
(→Chinese)
Estimativa da qualidade da tradu¸c˜ao autom´atica 9 / 31
24. Quality of Machine Translation Quality Estimation Open issues Conclusions
Outline
1 Quality of Machine Translation
2 Quality Estimation
3 Open issues
4 Conclusions
Estimativa da qualidade da tradu¸c˜ao autom´atica 10 / 31
25. Quality of Machine Translation Quality Estimation Open issues Conclusions
Overview
Metrics either depend on references or post-editing/use of
translations (task-based)
Estimativa da qualidade da tradu¸c˜ao autom´atica 11 / 31
26. Quality of Machine Translation Quality Estimation Open issues Conclusions
Overview
Metrics either depend on references or post-editing/use of
translations (task-based)
Our proposal
Quality assessment without reference, prior to
post-editing/use of translations
Estimativa da qualidade da tradu¸c˜ao autom´atica 11 / 31
27. Quality of Machine Translation Quality Estimation Open issues Conclusions
Overview
Why don’t translators use (more) MT?
Estimativa da qualidade da tradu¸c˜ao autom´atica 12 / 31
28. Quality of Machine Translation Quality Estimation Open issues Conclusions
Overview
Why don’t translators use (more) MT?
Translations are not good enough!
Estimativa da qualidade da tradu¸c˜ao autom´atica 12 / 31
29. Quality of Machine Translation Quality Estimation Open issues Conclusions
Overview
Why don’t translators use (more) MT?
Translations are not good enough!
What about TMs? Aren’t fuzzy matches useful?
Estimativa da qualidade da tradu¸c˜ao autom´atica 12 / 31
30. Quality of Machine Translation Quality Estimation Open issues Conclusions
Overview
Why don’t translators use (more) MT?
Translations are not good enough!
What about TMs? Aren’t fuzzy matches useful?
Estimativa da qualidade da tradu¸c˜ao autom´atica 12 / 31
31. Quality of Machine Translation Quality Estimation Open issues Conclusions
Framework
Quality estimation (QE): provide an estimate of
quality for new translated text *before* it is post-edited
Quality = post-editing effort
Estimativa da qualidade da tradu¸c˜ao autom´atica 13 / 31
32. Quality of Machine Translation Quality Estimation Open issues Conclusions
Framework
Quality estimation (QE): provide an estimate of
quality for new translated text *before* it is post-edited
Quality = post-editing effort
No access to reference translations: machine learning
techniques to predict post-editing effort scores
Estimativa da qualidade da tradu¸c˜ao autom´atica 13 / 31
33. Quality of Machine Translation Quality Estimation Open issues Conclusions
Framework
Quality estimation (QE): provide an estimate of
quality for new translated text *before* it is post-edited
Quality = post-editing effort
No access to reference translations: machine learning
techniques to predict post-editing effort scores
Considers interaction with TM systems: only used for
low fuzzy match cases, or to select between TM and MT
Estimativa da qualidade da tradu¸c˜ao autom´atica 13 / 31
34. Quality of Machine Translation Quality Estimation Open issues Conclusions
Framework
Quality estimation (QE): provide an estimate of
quality for new translated text *before* it is post-edited
Quality = post-editing effort
No access to reference translations: machine learning
techniques to predict post-editing effort scores
Considers interaction with TM systems: only used for
low fuzzy match cases, or to select between TM and MT
QTLaunchPad project
Multidimensional Quality Metrics for MT and HT, for manual
and (semi-)automatic evaluation (QE):
http://www.qt21.eu/launchpad/
Estimativa da qualidade da tradu¸c˜ao autom´atica 13 / 31
35. Quality of Machine Translation Quality Estimation Open issues Conclusions
Framework
QE system
Examples:
source &
translations,
quality scores
Quality
indicators
Estimativa da qualidade da tradu¸c˜ao autom´atica 14 / 31
36. Quality of Machine Translation Quality Estimation Open issues Conclusions
Framework
Source
text
MT system
Translation
QE system
Quality score
Examples:
source &
translations,
quality scores
Quality
indicators
Estimativa da qualidade da tradu¸c˜ao autom´atica 14 / 31
37. Quality of Machine Translation Quality Estimation Open issues Conclusions
Examples of positive results
Time to post-edit subset of sentences predicted as
“good” (low effort) vs time to post-edit random subset of
sentences
Estimativa da qualidade da tradu¸c˜ao autom´atica 15 / 31
38. Quality of Machine Translation Quality Estimation Open issues Conclusions
Examples of positive results
Time to post-edit subset of sentences predicted as
“good” (low effort) vs time to post-edit random subset of
sentences
Language no QE QE
fr-en 0.75 words/sec 1.09 words/sec
en-es 0.32 words/sec 0.57 words/sec
Estimativa da qualidade da tradu¸c˜ao autom´atica 15 / 31
39. Quality of Machine Translation Quality Estimation Open issues Conclusions
Examples of positive results
Time to post-edit subset of sentences predicted as
“good” (low effort) vs time to post-edit random subset of
sentences
Language no QE QE
fr-en 0.75 words/sec 1.09 words/sec
en-es 0.32 words/sec 0.57 words/sec
Accuracy in selecting best translation among 4 MT
systems
Best MT system Highest QE score
54% 77%
Estimativa da qualidade da tradu¸c˜ao autom´atica 15 / 31
40. Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art
Quality indicators:
Source text TranslationMT system
Confidence
indicators
Complexity
indicators
Fluency
indicators
Adequacy
indicators
Estimativa da qualidade da tradu¸c˜ao autom´atica 16 / 31
41. Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art
Quality indicators:
Source text TranslationMT system
Confidence
indicators
Complexity
indicators
Fluency
indicators
Adequacy
indicators
Learning algorithms: wide range
Estimativa da qualidade da tradu¸c˜ao autom´atica 16 / 31
42. Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art
Quality indicators:
Source text TranslationMT system
Confidence
indicators
Complexity
indicators
Fluency
indicators
Adequacy
indicators
Learning algorithms: wide range
Datasets: few with absolute human scores (1-4/5 scores,
PE time, edit distance)
Estimativa da qualidade da tradu¸c˜ao autom´atica 16 / 31
43. Quality of Machine Translation Quality Estimation Open issues Conclusions
Outline
1 Quality of Machine Translation
2 Quality Estimation
3 Open issues
4 Conclusions
Estimativa da qualidade da tradu¸c˜ao autom´atica 17 / 31
44. Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art indicators
Shallow indicators:
(S/T/S-T) Sentence length
(S/T) Language model
(S/T) Token-type ratio
(S) Average number of possible translations per word
(S) % of n-grams belonging to different frequency
quartiles of a source language corpus
(T) Untranslated/OOV words
(T) Mismatching brackets, quotation marks
(S-T) Preservation of punctuation
(S-T) Word alignment score, etc.
Estimativa da qualidade da tradu¸c˜ao autom´atica 18 / 31
45. Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art indicators
Shallow indicators:
(S/T/S-T) Sentence length
(S/T) Language model
(S/T) Token-type ratio
(S) Average number of possible translations per word
(S) % of n-grams belonging to different frequency
quartiles of a source language corpus
(T) Untranslated/OOV words
(T) Mismatching brackets, quotation marks
(S-T) Preservation of punctuation
(S-T) Word alignment score, etc.
These do well for estimation post-editing effort...
...but are not enough for other aspects of quality, e.g.
adequacy
Estimativa da qualidade da tradu¸c˜ao autom´atica 18 / 31
46. Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art indicators
Linguistic indicators - count-based:
(S/T/S-T) Content/non-content words
(S/T/S-T) Nouns/verbs/... NP/VP/...
(S/T/S-T) Deictics (references)
(S/T/S-T) Discourse markers (references)
(S/T/S-T) Named entities
(S/T/S-T) Zero-subjects
(S/T/S-T) Pronominal subjects
(S/T/S-T) Negation indicators
(T) Subject-verb / adjective-noun agreement
(T) Language Model of POS
(T) Grammar checking (dangling words)
(T) Coherence
Estimativa da qualidade da tradu¸c˜ao autom´atica 19 / 31
47. Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art indicators
Linguistic indicators - alignment-based:
(S-T) Correct translation of pronouns
(S-T) Matching of dependency relations
(S-T) Matching of named entities
(S-T) Alignment of parse trees
(S-T) Alignment of predicates & arguments, etc.
Estimativa da qualidade da tradu¸c˜ao autom´atica 20 / 31
48. Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art indicators
Linguistic indicators - alignment-based:
(S-T) Correct translation of pronouns
(S-T) Matching of dependency relations
(S-T) Matching of named entities
(S-T) Alignment of parse trees
(S-T) Alignment of predicates & arguments, etc.
Some indicators are language-dependent, others need
resources that are language-dependent, but apply to most
languages, e.g. LM of POS tags
Estimativa da qualidade da tradu¸c˜ao autom´atica 20 / 31
49. Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art indicators
Fine-grained, lexicalised indicators:
target-word = “process” =
1, if source-word = “hdhh alamlyt”.
0, otherwise.
target-word = “process” =
1, if source-pos = “DT DTNN”.
0, otherwise.
Estimativa da qualidade da tradu¸c˜ao autom´atica 21 / 31
50. Quality of Machine Translation Quality Estimation Open issues Conclusions
State-of-the-art indicators
Fine-grained, lexicalised indicators:
target-word = “process” =
1, if source-word = “hdhh alamlyt”.
0, otherwise.
target-word = “process” =
1, if source-pos = “DT DTNN”.
0, otherwise.
Closer to error detection
Need large amounts of training data [BHAO11], or RB approaches
Estimativa da qualidade da tradu¸c˜ao autom´atica 21 / 31
51. Quality of Machine Translation Quality Estimation Open issues Conclusions
Do these indicators work?
Estimativa da qualidade da tradu¸c˜ao autom´atica 22 / 31
52. Quality of Machine Translation Quality Estimation Open issues Conclusions
Do these indicators work?
To some extent... Issues:
Representation of shallow/deep indicators: counts,
ratios, (absolute) differences?
F = S − T, F = |S − T|, F =
T
S
, F =
S − T
S
...
Estimativa da qualidade da tradu¸c˜ao autom´atica 22 / 31
53. Quality of Machine Translation Quality Estimation Open issues Conclusions
Do these indicators work?
To some extent... Issues:
Representation of shallow/deep indicators: counts,
ratios, (absolute) differences?
F = S − T, F = |S − T|, F =
T
S
, F =
S − T
S
...
Resources to extract deep indicators: availability and
reliability
Estimativa da qualidade da tradu¸c˜ao autom´atica 22 / 31
54. Quality of Machine Translation Quality Estimation Open issues Conclusions
Do these indicators work?
To some extent... Issues:
Representation of shallow/deep indicators: counts,
ratios, (absolute) differences?
F = S − T, F = |S − T|, F =
T
S
, F =
S − T
S
...
Resources to extract deep indicators: availability and
reliability
Data to extract fine-grained indicators: need previously
translated and post-edited data esp. for negative
examples
Estimativa da qualidade da tradu¸c˜ao autom´atica 22 / 31
55. Quality of Machine Translation Quality Estimation Open issues Conclusions
Manual scoring: agreement between translators
Absolute value judgements: difficult to achieve consistency
across annotators even in highly controlled setup
Estimativa da qualidade da tradu¸c˜ao autom´atica 23 / 31
56. Quality of Machine Translation Quality Estimation Open issues Conclusions
Manual scoring: agreement between translators
Absolute value judgements: difficult to achieve consistency
across annotators even in highly controlled setup
en-es news WMT12 dataset: 3 professional
translators, 1-5 scores
15% of initial dataset discarded: annotators disagreed by
more than one category
Remaining annotations had to be scaled (0.33, 0.17,
0.50)
Estimativa da qualidade da tradu¸c˜ao autom´atica 23 / 31
57. Quality of Machine Translation Quality Estimation Open issues Conclusions
Manual scoring: Agreement between translators
en-pt subtitles of TV series: 3 non-professionals
annotators, 1-4 scores
351 cases (41%): full agreement
445 cases (52%): partial agreement
54 cases (7%): null agreement
Estimativa da qualidade da tradu¸c˜ao autom´atica 24 / 31
58. Quality of Machine Translation Quality Estimation Open issues Conclusions
Manual scoring: Agreement between translators
en-pt subtitles of TV series: 3 non-professionals
annotators, 1-4 scores
351 cases (41%): full agreement
445 cases (52%): partial agreement
54 cases (7%): null agreement
Agreement by score:
Score Full
4 59%
3 35%
2 23%
1 50%
Estimativa da qualidade da tradu¸c˜ao autom´atica 24 / 31
59. Quality of Machine Translation Quality Estimation Open issues Conclusions
More objective ways of annotating translations
HTER: Edit distance between MT output and its minimally
post-edited version
Estimativa da qualidade da tradu¸c˜ao autom´atica 25 / 31
60. Quality of Machine Translation Quality Estimation Open issues Conclusions
More objective ways of annotating translations
HTER: Edit distance between MT output and its minimally
post-edited version
HTER =
#edits
#words postedited version
Edits: substitute, delete, insert, shift
Estimativa da qualidade da tradu¸c˜ao autom´atica 25 / 31
61. Quality of Machine Translation Quality Estimation Open issues Conclusions
More objective ways of annotating translations
HTER: Edit distance between MT output and its minimally
post-edited version
HTER =
#edits
#words postedited version
Edits: substitute, delete, insert, shift
Analysis by Maarit Koponen (WMT-12) on post-edited
translations with HTER and 1-5 scores
A number of cases where translations with low HTER
(few edits) were assigned low quality scores (high
post-editing effort), and vice-versa
Estimativa da qualidade da tradu¸c˜ao autom´atica 25 / 31
62. Quality of Machine Translation Quality Estimation Open issues Conclusions
More objective ways of annotating translations
HTER: Edit distance between MT output and its minimally
post-edited version
HTER =
#edits
#words postedited version
Edits: substitute, delete, insert, shift
Analysis by Maarit Koponen (WMT-12) on post-edited
translations with HTER and 1-5 scores
A number of cases where translations with low HTER
(few edits) were assigned low quality scores (high
post-editing effort), and vice-versa
Certain edits seem to require more cognitive effort than
others - not captured by HTER
Estimativa da qualidade da tradu¸c˜ao autom´atica 25 / 31
63. Quality of Machine Translation Quality Estimation Open issues Conclusions
More objective ways of annotating translations
TIME: varies considerably across translators (expected)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0
100
200
300
400
500
600
A1
A2
A3
A4
A5
A6
A7
A8
Segments
Annotators
Seconds
Can we normalise this variation?
A dedicated QE system for each translator?
Estimativa da qualidade da tradu¸c˜ao autom´atica 26 / 31
64. Quality of Machine Translation Quality Estimation Open issues Conclusions
More objective ways of annotating translations
TIME: varies considerably across translators (expected)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.00
5.00
10.00
15.00
20.00
25.00
A1
A2
A3
A4
A5
A6
A7
A8
Annotators
Seconds / word
Segments
Can we normalise this variation?
A dedicated QE system for each translator?
Estimativa da qualidade da tradu¸c˜ao autom´atica 26 / 31
65. Quality of Machine Translation Quality Estimation Open issues Conclusions
More objective ways of annotating translations
Time, HTER, Keystrokes: data from 8 post-editors
Estimativa da qualidade da tradu¸c˜ao autom´atica 27 / 31
66. Quality of Machine Translation Quality Estimation Open issues Conclusions
More objective ways of annotating translations
PET: http://pers-www.wlv.ac.uk/~in1676/pet/
Estimativa da qualidade da tradu¸c˜ao autom´atica 27 / 31
67. Quality of Machine Translation Quality Estimation Open issues Conclusions
How to use estimated PE effort scores?
Should (supposedly) bad quality translations be filtered
out or shown to translators (different scores/colour
codes as in TMs)?
Wasting time to read scores and translations vs wasting
“gisting” information
Estimativa da qualidade da tradu¸c˜ao autom´atica 28 / 31
68. Quality of Machine Translation Quality Estimation Open issues Conclusions
How to use estimated PE effort scores?
Should (supposedly) bad quality translations be filtered
out or shown to translators (different scores/colour
codes as in TMs)?
Wasting time to read scores and translations vs wasting
“gisting” information
How to define a threshold on the estimated translation
quality to decide what should be filtered out?
Translator dependent
Task dependent (SDL)
Estimativa da qualidade da tradu¸c˜ao autom´atica 28 / 31
69. Quality of Machine Translation Quality Estimation Open issues Conclusions
How to use estimated PE effort scores?
Should (supposedly) bad quality translations be filtered
out or shown to translators (different scores/colour
codes as in TMs)?
Wasting time to read scores and translations vs wasting
“gisting” information
How to define a threshold on the estimated translation
quality to decide what should be filtered out?
Translator dependent
Task dependent (SDL)
Do translators prefer detailed estimates (sub-sentence
level) or an overall estimate for the complete sentence?
Too much information vs hard-to-interpret scores
Estimativa da qualidade da tradu¸c˜ao autom´atica 28 / 31
70. Quality of Machine Translation Quality Estimation Open issues Conclusions
Outline
1 Quality of Machine Translation
2 Quality Estimation
3 Open issues
4 Conclusions
Estimativa da qualidade da tradu¸c˜ao autom´atica 29 / 31
71. Quality of Machine Translation Quality Estimation Open issues Conclusions
Conclusions
It is possible to estimate at least certain aspects of MT
quality, esp. wrt PE effort: QuEst
http://quest.dcs.shef.ac.uk/
Estimativa da qualidade da tradu¸c˜ao autom´atica 30 / 31
72. Quality of Machine Translation Quality Estimation Open issues Conclusions
Conclusions
It is possible to estimate at least certain aspects of MT
quality, esp. wrt PE effort: QuEst
http://quest.dcs.shef.ac.uk/
PE effort estimates can be used in real applications
Ranking translations: filter out bad quality translations
Selecting translations from multiple MT systems
Estimativa da qualidade da tradu¸c˜ao autom´atica 30 / 31
73. Quality of Machine Translation Quality Estimation Open issues Conclusions
Conclusions
It is possible to estimate at least certain aspects of MT
quality, esp. wrt PE effort: QuEst
http://quest.dcs.shef.ac.uk/
PE effort estimates can be used in real applications
Ranking translations: filter out bad quality translations
Selecting translations from multiple MT systems
Commercial products by SDL (document-level for gisting)
and Multilizer
Estimativa da qualidade da tradu¸c˜ao autom´atica 30 / 31
74. Quality of Machine Translation Quality Estimation Open issues Conclusions
Conclusions
It is possible to estimate at least certain aspects of MT
quality, esp. wrt PE effort: QuEst
http://quest.dcs.shef.ac.uk/
PE effort estimates can be used in real applications
Ranking translations: filter out bad quality translations
Selecting translations from multiple MT systems
Commercial products by SDL (document-level for gisting)
and Multilizer
A number of open issues to be investigated...
Estimativa da qualidade da tradu¸c˜ao autom´atica 30 / 31
75. Quality of Machine Translation Quality Estimation Open issues Conclusions
Conclusions
It is possible to estimate at least certain aspects of MT
quality, esp. wrt PE effort: QuEst
http://quest.dcs.shef.ac.uk/
PE effort estimates can be used in real applications
Ranking translations: filter out bad quality translations
Selecting translations from multiple MT systems
Commercial products by SDL (document-level for gisting)
and Multilizer
A number of open issues to be investigated...
Collaboration with “human translators” essential
Estimativa da qualidade da tradu¸c˜ao autom´atica 30 / 31
76. Quality of Machine Translation Quality Estimation Open issues Conclusions
Conclusions
It is possible to estimate at least certain aspects of MT
quality, esp. wrt PE effort: QuEst
http://quest.dcs.shef.ac.uk/
PE effort estimates can be used in real applications
Ranking translations: filter out bad quality translations
Selecting translations from multiple MT systems
Commercial products by SDL (document-level for gisting)
and Multilizer
A number of open issues to be investigated...
Collaboration with “human translators” essential
My vision
Sub-sentence level QE (error detection), highlighting
errors but also given an overall estimate for the sentence
Estimativa da qualidade da tradu¸c˜ao autom´atica 30 / 31
77. Quality of Machine Translation Quality Estimation Open issues Conclusions
Estimativa da qualidade da tradu¸c˜ao
autom´atica
Lucia Specia
University of Sheffield
l.specia@sheffield.ac.uk
Faculdade de Letras da Universidade do Porto
13 May 2013
Estimativa da qualidade da tradu¸c˜ao autom´atica 31 / 31
78. Quality of Machine Translation Quality Estimation Open issues Conclusions
Autodesk.
Translation and Post-Editing Productivity.
In http: // translate. autodesk. com/ productivity. html ,
2011.
Nguyen Bach, Fei Huang, and Yaser Al-Onaizan.
Goodness: a method for measuring machine translation confidence.
pages 211–219, Portland, Oregon, 2011.
Markus Dreyer and Daniel Marcu.
Hyter: Meaning-equivalent semantics for translation evaluation.
In Proceedings of the 2012 Conference of the North American
Chapter of the Association for Computational Linguistics: Human
Language Technologies, pages 162–171, Montr´eal, Canada, 2012.
Intel.
Being Streetwise with Machine Translation in an Enterprise
Neighborhood.
Estimativa da qualidade da tradu¸c˜ao autom´atica 31 / 31
79. Quality of Machine Translation Quality Estimation Open issues Conclusions
In http:
// mtmarathon2010. info/ JEC2010_ Burgett_ slides. pptx ,
2010.
Intel.
Enabling Multilingual Collaboration through Machine Translation.
In http: // media12. connectedsocialmedia. com/ intel/ 06/
8647/ Enabling_ Multilingual_ Collaboration_ Machine_
Translation. pdf , 2012.
Estimativa da qualidade da tradu¸c˜ao autom´atica 31 / 31