Lori Thicke, CEO, Lexcelera
Lori presents results of a study for the SAS Institute on the impact of Global English on machine translation readiness and post-editing productivity. The basis of this study was John Kohl’s book: “The Global English Style Guide: Writing Clear, Translatable Documentation for a Global Audience”. This study quantified the gains of pre-editing the source according to global English rules in using the Systran V.7 engine, which has a rules-based front end and a statistical back end. After applying a few basic rules to improve the source text, post-editing productivity rose to at least 1200 words per hour, or 9600 words per day.
TAUS USER CONFERENCE 2010, Turbo-charge rule based machine translation productivity by improving your source text
1. TAUS USER CONFERENCE 2010
LANGUAGE BUSINESS INNOVATION
4 – 6 OCTOBER / PORTLAND (OR), USA
TUESDAY 5 OCTOBER / 14.45
TURBO-CHARGE RULE-BASED MACHINE
TRANSLATION PRODUCTIVITY BY IMPROVING
SOURCE TEXT
Lori Thicke, Lexcelera
2. Machine Translation is capable of
increasing speed, lowering costs and
yes, even improving quality.
3. Machine Translation is capable of
increasing speed, lowering costs and
yes, even improving quality.
… but not out of the box.
4. Optimization can mean:
Training the engine
Improving the engine
Improving the source
Correcting the target
5. Optimization can mean:
Training the engine
Improving the engine
Improving the source
Correcting the target
8. We asked the question: How can
improving the source influence post-
editing productivity using Systran’s
Hybrid engine?
9.
10. Relevance of each guideline to MT, human translation and non-native
speakers
Impact (1-3)
11. Leader in business analytics software and services
Largest independent business intelligence
vendor
#1 on FORTUNE’s “Best Places to Work
in America” list (2010)
12. Project Setup
Systran 7.0 Hybrid
Help document, not “acrochecked”
880 words
120 SAS glossary terms
13. 2 versions of the source document:
o Unedited
o Edited
4 scenarios
o untrained MT engine unedited source document
o edited source document
o trained MT engine unedited source document
o edited source document
Each file post-edited separately and time thoroughly
tracked
15. 1. Use active verbs, avoid the gerund
Source Target
Example 1: Use verbs to convey the most significant actions to your sentences
Understanding the differences between La compréhension des différences entre
owned and checked out alerts is critical les alertes possédées et Extraites est
to understanding SAS® Anti-Money critique au SAS® Anti-Money Laundering
Laundering. de compréhension.
In order to understand SAS® Anti-Money Afin de comprendre le SAS® Anti-Money
Laundering, you need to understand the Laundering, vous devez comprendre les
differences between owned alerts and différences entre les alertes détenues
checked out alerts. par un autre utilisateur et les alertes
bloquées.
Afin de comprendre le fonctionnement de SAS® Anti-Money Laundering, vous
devez comprendre les différences entre les alertes détenues par un autre
utilisateur et les alertes bloquées.
16. 2. Avoid the passive voice
Source Target
Example 1: Use verbs to convey the most significant actions to your sentences
Risk-factor-only alerts can be identified Des alertes de type facteur de risque
by the Scenario and Triggering Values uniquement peuvent être identifiées par
columns on an alert list window. le scénario et des colonnes Valeurs de
déclenchement sur une fenêtre de listes
des alertes.
To identify a risk-factor-only alert, the Pour identifier une alerte de type
Scenario column of the alert list window facteur de risque uniquement, la
displays either ML_Risk or TF_Risk. colonne Scénario de la fenêtre de listes
des alertes montre ML_Risk ou TF_Risk.
Pour identifier une alerte de type facteur de risque uniquement, la colonne
Scénario de la fenêtre de listes des alertes indique ML_Risk ou TF_Risk.
17. 3. Begin with the prepositional phrase
Source Target
Example 1: Use verbs to convey the most significant actions to your sentences
Click Check Out in that alert's Availability Le clic Extraient dans la colonne
column on the Available Alerts window. Disponibilité de cette alerte sur la
fenêtre Alertes disponibles.
In the Available Alerts window, click Dans la fenêtre Alertes disponibles, le
Check Out in the alert's Availability clic Extraient dans la colonne
column. Disponibilité de l'alerte.
Dans la fenêtre Alertes disponibles, cliquez sur Extraire dans la colonne
Disponibilité de l'alerte.
18. 4. Use short sentences with 1 idea
Source Target
Example 1: Use verbs to convey the most Des alertes sont your sentences
Alerts are displayed on alert list windows, significant actions tomontrées sur les fenêtres
which provide tools and information to aid de listes des alertes, qui fournissent des
users as they determine whether alerts outils et des informations aux utilisateurs
represent suspicious activity that should d'aide pendant qu'elles déterminent si les
be reported to authorities. alertes représentent l'activité suspecte qui
devrait être rapportée aux autorités.
Alerts are displayed in alert list windows. Des alertes sont montrées dans des
The alert list windows provide tools and fenêtres de listes des alertes. Les fenêtres
information that help users determine de listes des alertes fournissent les outils
whether alerts indicate suspicious activity et les informations qui aident des
that should be reported to authorities. utilisateurs à déterminer si les alertes
indiquent l'activité suspecte qui devrait
être rapportée aux autorités.
Les alertes s’affichent dans des fenêtres de listes des alertes. Les fenêtres de listes des
alertes fournissent les outils et les informations qui aident des utilisateurs à déterminer
si les alertes indiquent une activité suspecte qui devrait être signalée aux autorités.
19. How long to post-edit 880 words?
55
60 50
50
40 33
30 24
Post-edition
time (in min.)
20
10
0
no MT training, no MT training, with MT with MT
no source with source training, no training, with
editing editing source editing source editing
20. … compared to a traditional translation
120
100
80
60
40
20
0
21. Conclusions
+ A human translator = 120
minutes
+ Untrained MT is 2X faster
+ + Trained MT is 4X faster
+++ Trained MT with source
control is 5X faster
22. A final word (or two) about MT Quality
“Contrary to all expectations, using
MT in Bentley has improved the
French OLH reviewer: “I give a 9…I
translation quality in the pilot
find this translation very good…I
projects.”
found it better than the translations
I used to see before.”
German courseware reviewer: “It was
the best translation of courseware I
ever read.”
Questions?
lori@lexcelera.com