3. What we aim to cover?
The MT & Quality Relationship
What is quality?
Possible ways of measuring it
Automated evaluation methods
Who needs to measure quality
Localisation stakeholders
Conclusion
Machine Translation & Quality
4. The Quality & MT Relationship
Machine Translation & Quality
5. Attributes of Quality
Language Attributes
Adequacy
Accuracy of generated texts
Based on word recall & precision
Fluency
Comprehensibility of texts
Readability, understandability
Based on phrase reuse and
assembly
Task-oriented Attributes
Productivity
Post-editing speed
Acceptability
Fit-for-purpose measurement
Usable translations within the
context of the end user
Machine Translation & Quality
6. Automated Evaluations
Many difference techniques available
All compute similarity of generated texts to reference texts
The smaller the difference => the better the quality!
NIST
Fluency
Usability
GTM
F-Measure
Productivity
TER
Adequacy
BLEU
Acceptability
METEOR
Language
Task
Machine Translation & Quality
7. Who needs to measure Quality?
The Localisation Stakeholder Dilemma
Developers of MT Engines
Automated BLEU, METEOR, F-MEASURE, TER ideal and practical
No individual measurement has absolute meaning
but points quality curve in the right direction within a domain
Machine Translation & Quality
8. Who needs to measure Quality?
The Localisation Stakeholder Dilemma
Production Teams (PMs, LEs and QEs)
Need segment measurements on quality and PE efforts
Determine tiered segment post-edit rate
Distribution of post-editing tasks based on segment quality
Localisation Managers
Need productivity measurements to predict budget and schedule
Aka Project Segment Reports
MT Measurements need to ‘fit’ business planning and charge models
Translators
Unfortunately, don’t get a fair deal
No segment information, just top level project
Machine Translation & Quality
10. Conclusions
There are many automated MT quality measurements
Mostly suitable for MT developers
Not optimal for production teams
Of no use to translators
All rely on reference texts to compute measurements
What’s needed?
Segment level measurements
Drive project schedule and charge model
High correlation to human effort
Do not rely on reference texts to compute measurements
Machine Translation & Quality