Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas

January 15th, 2015
Throughputs: What is
Behind Productive
Post-Editing?

 What circumstances or variables most
reliably facilitate good-quality, highly
productive post-editing?
 Do conditions and parameters outside the
post-editor’s control facilitate or hamper his
or her success?

Welocalize Language Tools Team
 Implementation and management of
Machine Translation programs
 Analysis and research

The Database
Data gathered from 2013 to date
Objective:
Establish correlations between 3 evaluation approaches to:
- draw conclusions on predicting productivity gains in advance
- see how & when to use the different metrics best
Contents:
- Content Type
- Language Pair (English into XX)
- MT engine provider & owner (i.e. who owns training & maintenance)
- Metrics (BLEU & PE Distance, Adequacy & Fluency, Productivity deltas)
- MT error analysis
- Final QA scores
- Level of experience of resource doing productivity test
Throughputs and productivity study is carried out as part of a wider study that aims to gain
understanding and insight in Machine Translation data with the goal of making educated
business decisions for the future.

37 locales in total, with
varying amounts of
available data
11 different MT systems (SMT / Hybrid)
Marketing
Patents
Support
Tech. Doc.
UA
other
UI
The Database
Data used

Throughputs
The setup
 The throughput data used in this presentation is a by-product
of Welocalize’s productivity tests
 Throughputs per hour
 Translation from scratch: No translation memory was leveraged
for the translation part of the test
 185 samples
 13 different accounts
 6 generic categories
 11 different machine translation engines (statistical and hybrid)
All of the engines have been customized
 Linguists: At least three years of experience on the specific
content type + previous exposure to post-editing

Translation versus Post-editing
The data
Note: All resources that have taken part in productivity tests are represented
in these two graphics.
These graphs include all languages, content types and MT engines used
during the tests.

Translation versus Post-editing
When we join the data from the previous graphics together we
note that not all the resources improve equally (or at all) when
changing activities from translation to post-editing.
Comparison between Translation
and Post-editing Throughputs
The difference

Productivity Tests
The iOmegaT environment
 Post-Editing versus Human Translation
 Tests performed to validate predictive findings
 Tool: iOmegaT, instrumented version of open source CAT tool OmegaT, developed in
collaboration with John Moran (CNGL)
 iOmegaT tracks time spent editing segments, editing behaviour & activity
 Closely mimics translators’ usual work environment: integrated glossary, concordance,
etc. and compatible with 3rd party tools for language quality checks.
 Translators can visit a segment several times, if they change their mind later during
translation, or need to implement global changes, etc.
 Test sets consist of a mix of MTed segments to post-edit and no matches that need to be
translated from scratch
 Usual scope is 8h of translation / post-editing
 Provides productivity delta between post-edited and translated words
Note: high throughputs need to be interpreted within the context of this test environment

Evaluation Data
A sample
Productivity Results Human Evaluation LQA Automatic Scores
MT
Engine
Locale Productivity
Delta (%)
Adequacy
Score
Fluency
Score
LQA BLEU NIST TER Meteor Precision Recall GTM PE
Distance
MS Hub pt-BR 73.8% 3.65 3.42 99.04% 65.74 9.30 21.14 73.95 81.04 80.19 69.07 26.00%
MS Hub de-DE 22.9% 3.88 3.48 99.75% 40.76 6.69 46.30 55.45 70.03 68.13 48.96 34.23%
Data from a sample evaluation – example of evaluation criteria
 The productivity delta represents the percentage increase from the
average HT throughput when post-editing
 Good correlation between productivity results and automatic scores
 In spite of the 20 point BLEU/METEOR/GTM difference in the engines,
there are productivity gains in both
 The results reflect the differences between language groups well

Throughputs
The trend
Trend1: higher translation throughputs generally correlate with lower productivity
delta, as corresponding post-editing throughputs might not be significantly higher
 Previous post-editing studies have also highlighted this phenomenon (Gerberof,
Plitt & Masselot)
Average productivity delta
23.14%

Who benefits from Post-editing?
Analysis by Language and Content type
Languages selected for this analysis:
Content Types: Marketing, Patents, Support, Technical Documentation, UI
Brazilian Portuguese
French
German
Italian
Japanese
Latin-American Spanish
Polish
Russian
Simplified Chinese
Spanish

Language complexity grouping
for MT PE
MT PE Reference
table

Romance Languages
ES_LA
IT
ES
FR
PT_BR
38%
32%
29%
26%
23%
 Romance
languages are the
group that usually
renders highest
productivity gains.
 Within Romance
languages, Latin
American Spanish
and Brazilian
Portuguese are often
the ones with the
highest productivity
gains from the point
of view of PE.

German and Slavic Languages
 German and Slavic
are considered
medium complexity
languages
 Availability of
training resources
and post-editor’s
make these
languages a good fit
for MT PE
14%
15%
16%
17%
RU
PL
17%
15%

Asian Languages
 Asian languages are
considered complex
from the point of
view of MT.
 Productivity gains
depend on
translator’s method
of working and their
expertise in PE.
Simplified Chinese
can render high
productivity gains, as
shown in the graph.
0%
5%
10%
15%
14%
JP
Average Productivity delta - ZH CN
6%

Content types
Marketing
Average Productivity delta - ZH CN
6%
 Marketing remains a challenging content type for post-editing due to
high quality expectations and free style. However, productivity gains can
still be realised with well-trained MT systems and content that is not
transcreation.

Content types
Technical Documentation
 Technical Documentation is a good content type for MT PE.
 Characteristics: constrained, often structured language; human-quality
translation expectations but without added style and voice requirements.

Content types
Support
 Support: Knowledge-base content, technical blogs, procedural articles,
Q&A, etc.
 More relaxed quality expectations make this type of content very
suitable for Machine Translation.
 In some instances this content is suitable for raw MT publishing when a
customized engine is used.

Content types
Other content types
14%
14%
15%
15%
16%
16%
17%
17%
18%
18%
Patents UI
15%
18%
User Generated Content
• Highly productive due to low number of touch points during post-editing
• Examples: travel and consumer reviews, blogs
• Quality expectations are very relaxed
• Only accuracy with original meaning is requested
• No terminology checks or cosmetic changes are necessary
• Very high expected throughputs: from 500 to 1,000 per hour
• Also suitable for raw MT publishing when a customized engine is used

Quality
Misconceptions
The idea that high throughputs affect MT quality is inaccurate.
Sometimes linguistic issues appear more frequently in translated segments
and in fuzzy-matches than in post-edited segments.
Examples of good
quality and high
throughputs
Language MT
(words/hr)
LQA
Percentage
ja_JP 441 99.89%
es_LA 492 99.60%
pl_PL 644 99.91%
sk_SK 769 99.50%
hu_HU 847 99.73%

Post-editing
Other factors
Years experience
In a recent survey…
 Most respondents have more experience with translation than with
post-editing
 The overall correlation between translation experience and post-
editing experience is “strong”
However, looking at correlations by locale
German: very strong
French: weak
Japanese: weak
PTBR: strong
Hungarian: weak
 This suggests that for German and Brazilian Portuguese only, the
overall experience as professional translator (whether junior or
senior) gives us insights into how much post-editing experience to
expect. For the other 3 locales, profiles are more varied

Post-editing
Other factors
- Experience working on certain content type: most linguists used
for productivity tests are very experienced translating / post-
editing the tested content type
- No clear trend with regard to background, assuming translation
background like freelance/staff translator, content type
experience, etc.
- No clear trend in relation to working environment (office / at
home, etc.)
Text input methods:
 French and German translators seem to make more use of CAT
tool shortcuts
 Japanese requires the use of Input Method Editors and less use
of shortcuts

Final conclusions
• Based on our findings, Romance languages are the best performers
on MT PE
• All content types are suitable for MT PE, with the exception of
Transcreation; Technical Documentation and Technical Support are
two of the most suitable (apart from UGC).
• Not all translators improve at the same pace when moving to post-
editing
• Productivity increases most in individuals with average translation
throughputs
• Knowledge of the subject matter helps achieving high throughputs
• It is more difficult to foresee post-editing effort than to asses the
quality of raw MT. The human effort is still the most variable aspect.
• There is no quality degradation in MT PE

Questions and answers
Any questions?
Laura Casanellas, WL Language Tools
laura.casanellas@welocalize.com

Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (6)

Similaire à Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas

Similaire à Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas (20)

Plus de Welocalize

Plus de Welocalize (13)

Dernier

Dernier (20)

Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas