TAUS Quality Evaluation Summit - 28 May 2015, Dublin
TAUS proposes a new metric to measure productivity called TAUS Efficiency Score. This score will replace traditional productivity measurement. Productivity is widely used for measuring the throughput of translators or quantifying the quality of MT engines. While the productivity score is a good first performance indicator, it ignores a number of factors that should be taken into account when assessing productivity and quality (e.g. final quality of the translation, edit distance, difficulty of source, etc). We propose the Efficiency Score that can be applied to every translation project: translation from scratch, translation with translation memory, PEMT or a mix of these three. The Efficiency score is flexible in that the number of variables used to calculate it and the ways different measurements are taken into account vary based on user requirements and the available data.
6. Edit Distance
Inholland University of Applied Sciences
Levenshtein distance
The Levenshtein distance calculates how many
operations are necessary to modify one sentence into
another one. The number of single character edits
(insertion, deletion, replacement) needed, is called the
Levenshtein distance.
Wagner & Fischer algorithm
A known algorithm that calculates the Levenshtein
distance metric.
12. Limitations and further work
More data for benchmarking
From relative to absolute scores
0 score theoretically possible =
discouraging
Eliminating outliers
Additional variables to include
13. Additional variables to include
Keystrokes – nmr of keystrokes
Mouse clicks – nmr of clicks
TM fuzzy – 0-100%
MT confidence – 0-100%
Quality – Review, automatic QA or manual
QE
Difficulty of Source
Experience – nmr of words produced
14. This slide may not be used or copied without permission from TAUS
Notes de l'éditeur
But it is not only the Edit distance that can give valuable information. Here are more available data that can be extracted from the database and be used to create interesting metrics.
The sentence length …
The post-editing time …
The post-editing quality …
The Words per hour that a translator can post-edit can be calculated given the time that was spent to post-edit along with the sentence length. being able to find the edit distance and knowing the post-editing quality that was required can give a more clear view of each translator.
But it is not only the Edit distance that can give valuable information. Here are more available data that can be extracted from the database and be used to create interesting metrics.
The sentence length …
The post-editing time …
The post-editing quality …
The Words per hour that a translator can post-edit can be calculated given the time that was spent to post-edit along with the sentence length. being able to find the edit distance and knowing the post-editing quality that was required can give a more clear view of each translator.
But it is not only the Edit distance that can give valuable information. Here are more available data that can be extracted from the database and be used to create interesting metrics.
The sentence length …
The post-editing time …
The post-editing quality …
The Words per hour that a translator can post-edit can be calculated given the time that was spent to post-edit along with the sentence length. being able to find the edit distance and knowing the post-editing quality that was required can give a more clear view of each translator.
But it is not only the Edit distance that can give valuable information. Here are more available data that can be extracted from the database and be used to create interesting metrics.
The sentence length …
The post-editing time …
The post-editing quality …
The Words per hour that a translator can post-edit can be calculated given the time that was spent to post-edit along with the sentence length. being able to find the edit distance and knowing the post-editing quality that was required can give a more clear view of each translator.
Starting with the findings I would like to explain the main algorithm that was used to extract an important metric from the work of the translators.
While the translators work on a project using post-editing they have to correct a pre-translated text to make it reach certain standards.
Comparing the pre-translated text with the corrected text and being able to find the changes that were required during this procedure can give information about the effort that was required to do that.
The method that was chosen to perform this comparison is the Levensthein distance and the Wagner & Fischer Algorithm was used in the implementation. The Levenshtein distance calculates how many operations are necessary to modify one sentence into another one. The number of single character edits needed, is called the Levenshtein distance.
But it is not only the Edit distance that can give valuable information. Here are more available data that can be extracted from the database and be used to create interesting metrics.
The sentence length …
The post-editing time …
The post-editing quality …
The Words per hour that a translator can post-edit can be calculated given the time that was spent to post-edit along with the sentence length. being able to find the edit distance and knowing the post-editing quality that was required can give a more clear view of each translator.
But it is not only the Edit distance that can give valuable information. Here are more available data that can be extracted from the database and be used to create interesting metrics.
The sentence length …
The post-editing time …
The post-editing quality …
The Words per hour that a translator can post-edit can be calculated given the time that was spent to post-edit along with the sentence length. being able to find the edit distance and knowing the post-editing quality that was required can give a more clear view of each translator.
But it is not only the Edit distance that can give valuable information. Here are more available data that can be extracted from the database and be used to create interesting metrics.
The sentence length …
The post-editing time …
The post-editing quality …
The Words per hour that a translator can post-edit can be calculated given the time that was spent to post-edit along with the sentence length. being able to find the edit distance and knowing the post-editing quality that was required can give a more clear view of each translator.