SlideShare une entreprise Scribd logo
1  sur  14
TAUS Best Practices
Evaluating Post-Editor Performance Guidelines
January 2014
Why do we need guidelines for Post-Editor
performance?
Machine translation (MT) with post-editing is fast becoming a standard
practice in our industry. This means that organizations need to be able to
easily identify, qualify, train and evaluate Post-Editor performance.
Today, there are many methodologies in use, resulting in a lack of cohesive
standards as organizations take various approaches for evaluating
performance. Some use Final Output Quality Evaluation or Post-Editor
Productivity as a standalone metric. Others analyze quality data such as
“over-edit” or “under-edit” of the Post-Editor’s effort or evaluate the
percentage of MT suggestions used versus MT suggestions that are discarded
in the final output.
An agreed set of best practices will help the industry fairly and efficiently
select the most suitable talent for post-editing work and identify the training
opportunities that will help translators and new players, such as
crowdsourcing resources, become highly skilled and qualified Post-Editors.
Related guidelines
These guidelines should be applied in combination with three related
TAUS Guidelines:
• Machine Translation Post-editing Guidelines, which help customers
and service providers set clear quality expectations and can be used
as a basis on which to instruct post-editors.
• Machine Translation Post-editing Productivity Testing
Guidelines, which provide guidance on the design and
measurement for both short and long-term productivity tests.
• Pricing MT-Post-Editing Guidelines, which share current wisdom on
how to arrive at a predictive, fair and appropriate pricing model.
Scope
This Guideline focuses on the analysis and
interpretation of productivity testing results
combined with the evaluation of the final postedited output and scores obtained via industrystandard automated scoring metrics in order to
evaluate post-editors’ performance. It does not
cover design and execution of post-editing
productivity tests, post-editing quality levels and
pricing recommendations.
Defining Goals
Determine the objectives of your evaluation.
The suggested goals for the translation services provider are:
•

•

Identify the best performers from the pool of post-editors, who deliver the desired level of quality output
with the highest productivity gains; identify the “ideal” post-editor profile for the specific content type and
quality requirements (linguist, domain specialist, “casual” translator).
Identify common over-edit and under-edit mistakes in order to refine post-editing guidelines and
determine the workforce training needs to achieve higher productivity.

The suggested goals for the translation services buyer are:
•

•

Gather intelligence on the performance of the in-house technology used to enable the post-editing
process, such as a translation management system, a recommended post- editing environment and MT
engines.
Based on the post-editor productivity, set the realistic TAT (turnaround time) expectations, determine the
appropriate pricing structure for specific content types and language pair and reflect the above in an SLA
(Service Level Agreement).
Structuring your Analysis
In order to select the top productivity performers and evaluate the quality of
the output using reviewers:
Select a subset of the content used for the productivity test for which the
highest and the lowest productivity is seen (the “outliers”), and evaluate the
quality of the output using reviewers and automated quality evaluation tools
(spellcheckers, Checkmate, X-bench, style consistency evaluation tools). Make
sure the final output meets your quality expectations for the selected content
types.
Use multiple translators and multiple reviewers.
Make sure there is minimal subjective evaluation. Provide clear evaluation
guidelines to the reviewers and make certain the reviewers’ expectations and
the Post-Editors’ instructions are aligned. Refer to the known most common
post-editing mistakes in your guidelines.
Examples of full post-editing known problem areas:

• Handling of measurements and locale-specific punctuation,
date formats and alike
• Correcting inconsistencies in terminology, terminology
disambiguation
• Handling of list elements, tables or headers versus body
text
• Handling of proper names, product names and other
DoNotTranslate elements
• Repetitions (consistent exact matches)
• Removing duplicates, fixing omissions (for SMT output
post-editing)
• Morphology (agreement), negations, word order, plural vs.
singular
Examples of light post-editing known problem
areas:
• Correctly conveying the meaning of the source
sentence
• Correcting inconsistencies in terminology
• Removing duplicates, fixing omissions (for
SMT output post-editing)
• Morphology (agreement), negations, word
order, plural vs. singular
In order to identify the common over-edit and
under-edit patterns:
Obtain edit distance and MT quality data for the
content post-edited during the productivity
evaluation, using the industry-standard
methods, e.g. GTM, Levenshtein, BLEU, TER.
If your productivity evaluation tool captures this
data, obtain information on the changes made by
post-editors, i.e. edit location, nature of edits; if
your tool doesn’t capture such data a simple file
difference tool can be used.
You will be able to determine realistic turnaround
time and pricing expectations based on the
productivity and quality data. Always have a clear
human translation benchmark for reference (e.g.
2000 words per day for European languages) unless
your tool allows you to capture the actual inproduction data.
For smaller LSPs and freelance translators, tight
turnaround projects or other limited bandwidth
scenarios reliance on legacy industry information is
recommended.
Analyzing Results
Make sure that the final output quality matches the desired quality
level for the selected content type, differentiate between the full postediting quality level and light post-editing quality level (refer to the
TAUS Machine Translation Post-editing Guidelines for further
clarification).
Assess individual post-editors’ performance using side-by-side data for
more than one post-editor; use the individual post-editor data to
identify the most suitable post-editor profile for the specific content
types and quality levels.
Calculate the mode (the number which appears most often in a set of
numbers) based on scores of multiple post-editors to obtain data
specific to certain content/quality/language combination; gather data
for specific sentence length ranges and sentence types.
Do not use obtained productivity data in isolation in order to calculate
expected daily throughputs and turnaround times, as the reported values
reflect the “ideal” off-production scenario; add time necessary for
terminology and concept research, administrative tasks, breaks and alike.
Identify best and worst performing sentence types (length, presence/absence
of DNT elements, tags, terminology, certain syntactic structures) and gather
best practices for post-editors on most optimal handling of such sentences
and elements.

Analyze edit distance data alongside with the MT quality evaluation data and
assessment by reviewers to determine whether the edits made by posteditors were necessary for meeting the output quality requirements, gather
best practices for specific edit types. Do not use edit distance data in isolation
to evaluate the post-editor performance.
Analyze the nature of edits using the data obtained during the productivity
tests, use top performer’s data to create recommendations for lower
performers; use the obtained information to provide feedback on the MT
engine.
Matrix
It is recommended that you create a clear matrix of post-editing productivity,
quality, TAT and pricing discount expectations based on the results of your
analysis. The proposed matrix structure is below:
• Content type and purpose of content
• Expected post-edited output quality level; risk assessment and error
tolerance by error type for the specific content
• CAT-enabled environment or isolated post-editing environment; if the
former – whether it will have an impact on the expected productivity gains
• Human translation daily throughput (use industry-standard daily
throughout expectations as a benchmark)
• Post-editing daily throughput (based on productivity gain percentages)
• Productivity gain (per hour, per day)
• Expected pricing discount based on the factors above
Our thanks to:
Olga Beregovaya (Welocalize) for drafting and editing these guidelines based on the breakout
session at the TAUS Quality Evaluation Summit.
The following organizations reviewed and gave feedback on the first version of the Guidelines at
the TAUS Quality Evaluation Summit 9 October 2013, San Jose:
2M Language Services, Adobe, AVB Vertallingen, Blackberry, CNC
Software, Inc., CrossLang, DFKI, Digital
Linguistics, eBay, EBSCO, Eventbrite, Gengo, Google, Hewlett
Packard, Intuit, KantanMT, Lionbridge, Localization Institute, MasterWord Services
Inc., Microsoft, PayPal, PTC and Zynga.
Reviewers: Achim Ruopp (TAUS), Willem Stoeller (International Consulting LLC) and Attila Görög
(TAUS)
Feedback
To give feedback on how to improve the guidelines, please write to dqf@tauslabs.com.

Contenu connexe

En vedette

En vedette (8)

"Living up to the promise..." - June 2012
"Living up to the promise..." - June 2012"Living up to the promise..." - June 2012
"Living up to the promise..." - June 2012
 
TAUS Data Association super cloud as a real time web service
TAUS Data Association super cloud as a real time web serviceTAUS Data Association super cloud as a real time web service
TAUS Data Association super cloud as a real time web service
 
TAUS Best Practices Adequacy/Fluency Guidelines
TAUS Best Practices Adequacy/Fluency GuidelinesTAUS Best Practices Adequacy/Fluency Guidelines
TAUS Best Practices Adequacy/Fluency Guidelines
 
TAUS USER CONFERENCE 2010, GeoWorkz and the TAUS Data Association
TAUS USER CONFERENCE 2010, GeoWorkz and the TAUS Data AssociationTAUS USER CONFERENCE 2010, GeoWorkz and the TAUS Data Association
TAUS USER CONFERENCE 2010, GeoWorkz and the TAUS Data Association
 
TAUS New Year's Reception 2014
TAUS New Year's Reception 2014TAUS New Year's Reception 2014
TAUS New Year's Reception 2014
 
TAUS MT SHOWCASE, The DQF Tools, Rahzeb Choudhury and Maxim Khalilov, TAUS, 1...
TAUS MT SHOWCASE, The DQF Tools, Rahzeb Choudhury and Maxim Khalilov, TAUS, 1...TAUS MT SHOWCASE, The DQF Tools, Rahzeb Choudhury and Maxim Khalilov, TAUS, 1...
TAUS MT SHOWCASE, The DQF Tools, Rahzeb Choudhury and Maxim Khalilov, TAUS, 1...
 
TAUS USER CONFERENCE 2010, Turbo-charge rule based machine translation produc...
TAUS USER CONFERENCE 2010, Turbo-charge rule based machine translation produc...TAUS USER CONFERENCE 2010, Turbo-charge rule based machine translation produc...
TAUS USER CONFERENCE 2010, Turbo-charge rule based machine translation produc...
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Luc Meertens, Crosslang...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Luc Meertens, Crosslang...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Luc Meertens, Crosslang...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Luc Meertens, Crosslang...
 

Similaire à TAUS Evaluating Post-Editor Performance Guidelines

CAT or TMS Implementation: Calculation of the Number of Licenses and the Tota...
CAT or TMS Implementation: Calculation of the Number of Licenses and the Tota...CAT or TMS Implementation: Calculation of the Number of Licenses and the Tota...
CAT or TMS Implementation: Calculation of the Number of Licenses and the Tota...
ABBYY Language Serivces
 
TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...
TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...
TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...
TAUS - The Language Data Network
 
Manual testing interview questions and answers
Manual testing interview questions and answersManual testing interview questions and answers
Manual testing interview questions and answers
karanmca
 
Hitchhikers guide to_data_center_facility_ops
Hitchhikers guide to_data_center_facility_opsHitchhikers guide to_data_center_facility_ops
Hitchhikers guide to_data_center_facility_ops
avdsouza
 
Chowdappa Resume
Chowdappa ResumeChowdappa Resume
Chowdappa Resume
chowdappa o
 
Chowdappa Resume
Chowdappa ResumeChowdappa Resume
Chowdappa Resume
chowdappa o
 

Similaire à TAUS Evaluating Post-Editor Performance Guidelines (20)

Towards industry benchmarking of MT engines and a database of MT use cases (J...
Towards industry benchmarking of MT engines and a database of MT use cases (J...Towards industry benchmarking of MT engines and a database of MT use cases (J...
Towards industry benchmarking of MT engines and a database of MT use cases (J...
 
CAT or TMS Implementation: Calculation of the Number of Licenses and the Tota...
CAT or TMS Implementation: Calculation of the Number of Licenses and the Tota...CAT or TMS Implementation: Calculation of the Number of Licenses and the Tota...
CAT or TMS Implementation: Calculation of the Number of Licenses and the Tota...
 
TAUS Post-Editing Productivity Guidelines
TAUS Post-Editing Productivity GuidelinesTAUS Post-Editing Productivity Guidelines
TAUS Post-Editing Productivity Guidelines
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happy
 
TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...
TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...
TAUS Roundtable Moscow, CAT or TMS Implementation-Calculation of the Number o...
 
Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...
 
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
 
TAUS Best Practices Error Typology Guidelines
TAUS Best Practices Error Typology GuidelinesTAUS Best Practices Error Typology Guidelines
TAUS Best Practices Error Typology Guidelines
 
Getting the Most from MT + PE
Getting the Most from MT + PEGetting the Most from MT + PE
Getting the Most from MT + PE
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_sa
 
Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case Study
 
Manual testing interview questions and answers
Manual testing interview questions and answersManual testing interview questions and answers
Manual testing interview questions and answers
 
Predictive Analysis in Machine Translation is Business Intelligence.
Predictive Analysis in Machine Translation is Business Intelligence.Predictive Analysis in Machine Translation is Business Intelligence.
Predictive Analysis in Machine Translation is Business Intelligence.
 
Hitchhikers guide to_data_center_facility_ops
Hitchhikers guide to_data_center_facility_opsHitchhikers guide to_data_center_facility_ops
Hitchhikers guide to_data_center_facility_ops
 
Test
TestTest
Test
 
Benchmarking
BenchmarkingBenchmarking
Benchmarking
 
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2...
 
Chowdappa Resume
Chowdappa ResumeChowdappa Resume
Chowdappa Resume
 
Chowdappa Resume
Chowdappa ResumeChowdappa Resume
Chowdappa Resume
 
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
 

Plus de TAUS - The Language Data Network

Plus de TAUS - The Language Data Network (20)

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
 
Farmer Lv (TrueTran)
Farmer Lv (TrueTran)Farmer Lv (TrueTran)
Farmer Lv (TrueTran)
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 
Translation Technology Showcase in Shenzhen
Translation Technology Showcase in ShenzhenTranslation Technology Showcase in Shenzhen
Translation Technology Showcase in Shenzhen
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
 
SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)
 
How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 
QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

TAUS Evaluating Post-Editor Performance Guidelines

  • 1. TAUS Best Practices Evaluating Post-Editor Performance Guidelines January 2014
  • 2. Why do we need guidelines for Post-Editor performance? Machine translation (MT) with post-editing is fast becoming a standard practice in our industry. This means that organizations need to be able to easily identify, qualify, train and evaluate Post-Editor performance. Today, there are many methodologies in use, resulting in a lack of cohesive standards as organizations take various approaches for evaluating performance. Some use Final Output Quality Evaluation or Post-Editor Productivity as a standalone metric. Others analyze quality data such as “over-edit” or “under-edit” of the Post-Editor’s effort or evaluate the percentage of MT suggestions used versus MT suggestions that are discarded in the final output. An agreed set of best practices will help the industry fairly and efficiently select the most suitable talent for post-editing work and identify the training opportunities that will help translators and new players, such as crowdsourcing resources, become highly skilled and qualified Post-Editors.
  • 3. Related guidelines These guidelines should be applied in combination with three related TAUS Guidelines: • Machine Translation Post-editing Guidelines, which help customers and service providers set clear quality expectations and can be used as a basis on which to instruct post-editors. • Machine Translation Post-editing Productivity Testing Guidelines, which provide guidance on the design and measurement for both short and long-term productivity tests. • Pricing MT-Post-Editing Guidelines, which share current wisdom on how to arrive at a predictive, fair and appropriate pricing model.
  • 4. Scope This Guideline focuses on the analysis and interpretation of productivity testing results combined with the evaluation of the final postedited output and scores obtained via industrystandard automated scoring metrics in order to evaluate post-editors’ performance. It does not cover design and execution of post-editing productivity tests, post-editing quality levels and pricing recommendations.
  • 5. Defining Goals Determine the objectives of your evaluation. The suggested goals for the translation services provider are: • • Identify the best performers from the pool of post-editors, who deliver the desired level of quality output with the highest productivity gains; identify the “ideal” post-editor profile for the specific content type and quality requirements (linguist, domain specialist, “casual” translator). Identify common over-edit and under-edit mistakes in order to refine post-editing guidelines and determine the workforce training needs to achieve higher productivity. The suggested goals for the translation services buyer are: • • Gather intelligence on the performance of the in-house technology used to enable the post-editing process, such as a translation management system, a recommended post- editing environment and MT engines. Based on the post-editor productivity, set the realistic TAT (turnaround time) expectations, determine the appropriate pricing structure for specific content types and language pair and reflect the above in an SLA (Service Level Agreement).
  • 6. Structuring your Analysis In order to select the top productivity performers and evaluate the quality of the output using reviewers: Select a subset of the content used for the productivity test for which the highest and the lowest productivity is seen (the “outliers”), and evaluate the quality of the output using reviewers and automated quality evaluation tools (spellcheckers, Checkmate, X-bench, style consistency evaluation tools). Make sure the final output meets your quality expectations for the selected content types. Use multiple translators and multiple reviewers. Make sure there is minimal subjective evaluation. Provide clear evaluation guidelines to the reviewers and make certain the reviewers’ expectations and the Post-Editors’ instructions are aligned. Refer to the known most common post-editing mistakes in your guidelines.
  • 7. Examples of full post-editing known problem areas: • Handling of measurements and locale-specific punctuation, date formats and alike • Correcting inconsistencies in terminology, terminology disambiguation • Handling of list elements, tables or headers versus body text • Handling of proper names, product names and other DoNotTranslate elements • Repetitions (consistent exact matches) • Removing duplicates, fixing omissions (for SMT output post-editing) • Morphology (agreement), negations, word order, plural vs. singular
  • 8. Examples of light post-editing known problem areas: • Correctly conveying the meaning of the source sentence • Correcting inconsistencies in terminology • Removing duplicates, fixing omissions (for SMT output post-editing) • Morphology (agreement), negations, word order, plural vs. singular
  • 9. In order to identify the common over-edit and under-edit patterns: Obtain edit distance and MT quality data for the content post-edited during the productivity evaluation, using the industry-standard methods, e.g. GTM, Levenshtein, BLEU, TER. If your productivity evaluation tool captures this data, obtain information on the changes made by post-editors, i.e. edit location, nature of edits; if your tool doesn’t capture such data a simple file difference tool can be used.
  • 10. You will be able to determine realistic turnaround time and pricing expectations based on the productivity and quality data. Always have a clear human translation benchmark for reference (e.g. 2000 words per day for European languages) unless your tool allows you to capture the actual inproduction data. For smaller LSPs and freelance translators, tight turnaround projects or other limited bandwidth scenarios reliance on legacy industry information is recommended.
  • 11. Analyzing Results Make sure that the final output quality matches the desired quality level for the selected content type, differentiate between the full postediting quality level and light post-editing quality level (refer to the TAUS Machine Translation Post-editing Guidelines for further clarification). Assess individual post-editors’ performance using side-by-side data for more than one post-editor; use the individual post-editor data to identify the most suitable post-editor profile for the specific content types and quality levels. Calculate the mode (the number which appears most often in a set of numbers) based on scores of multiple post-editors to obtain data specific to certain content/quality/language combination; gather data for specific sentence length ranges and sentence types.
  • 12. Do not use obtained productivity data in isolation in order to calculate expected daily throughputs and turnaround times, as the reported values reflect the “ideal” off-production scenario; add time necessary for terminology and concept research, administrative tasks, breaks and alike. Identify best and worst performing sentence types (length, presence/absence of DNT elements, tags, terminology, certain syntactic structures) and gather best practices for post-editors on most optimal handling of such sentences and elements. Analyze edit distance data alongside with the MT quality evaluation data and assessment by reviewers to determine whether the edits made by posteditors were necessary for meeting the output quality requirements, gather best practices for specific edit types. Do not use edit distance data in isolation to evaluate the post-editor performance. Analyze the nature of edits using the data obtained during the productivity tests, use top performer’s data to create recommendations for lower performers; use the obtained information to provide feedback on the MT engine.
  • 13. Matrix It is recommended that you create a clear matrix of post-editing productivity, quality, TAT and pricing discount expectations based on the results of your analysis. The proposed matrix structure is below: • Content type and purpose of content • Expected post-edited output quality level; risk assessment and error tolerance by error type for the specific content • CAT-enabled environment or isolated post-editing environment; if the former – whether it will have an impact on the expected productivity gains • Human translation daily throughput (use industry-standard daily throughout expectations as a benchmark) • Post-editing daily throughput (based on productivity gain percentages) • Productivity gain (per hour, per day) • Expected pricing discount based on the factors above
  • 14. Our thanks to: Olga Beregovaya (Welocalize) for drafting and editing these guidelines based on the breakout session at the TAUS Quality Evaluation Summit. The following organizations reviewed and gave feedback on the first version of the Guidelines at the TAUS Quality Evaluation Summit 9 October 2013, San Jose: 2M Language Services, Adobe, AVB Vertallingen, Blackberry, CNC Software, Inc., CrossLang, DFKI, Digital Linguistics, eBay, EBSCO, Eventbrite, Gengo, Google, Hewlett Packard, Intuit, KantanMT, Lionbridge, Localization Institute, MasterWord Services Inc., Microsoft, PayPal, PTC and Zynga. Reviewers: Achim Ruopp (TAUS), Willem Stoeller (International Consulting LLC) and Attila Görög (TAUS) Feedback To give feedback on how to improve the guidelines, please write to dqf@tauslabs.com.