3. What we aim to cover today?
How to improve the quality of your MT
Engine?
A Build – Measure – Learn process
How do we measure and quantify Quality
in MT?
Practical illustrations throughout of
KantanMT in action
Questions & Answers
3 Steps to Improve Quality
4. 3-Steps to Higher Quality
•
Evolutionary Process
Not a once off step
• Continuous improvement loop
• Incremental Improvements over
time
•
•
GIGO
•
•
Build
Build
Build
Kantan
MT
Engine
Garbage In => Garbage Out
Production Engine
At least three iterations
• Experimentation with different
inputs
• Measurements of different outputs
• Control your own destiny
•
Learn
Measure
3 Steps to Improve Quality
5. Build – Building quality training streams
•
Training Data
•
•
•
Bad Training Data
•
•
How KantanMT learns to translate
Mimic your style, terminology, fluency
Garbage In => Garbage Out
Three main factors:
Quality
Relevance to domain
Quantity
3 Steps to Improve Quality
6. Build – Building quality training streams
Training Data - Three main factors:
Quality
The linguistic quality of the training material is crucially
important
Relevance to domain
A high quality MT system has good domain knowledge
Similar to the way you’ve always worked with Translation
Memories and CAT tools
Quantity
The more training data you use to build your engine the
better its capacity to generate translations that mimic
your translation style and terminology
3 Steps to Improve Quality
7. Build – Building quality training streams
Balancing the equation
Quality
•
3 Steps to Improve Quality
8. Build – is Quantity important?
Not if quality is good – it’s a balancing act!
Quality
3 Steps to Improve Quality
9. Build – Building quality training streams
Quality Training Data - Suitable Sources
* KantanMT Stock Engines
Language
Base Data
* Translation Memories
Translation
sources
* Other Translation Memories
Domain
Base Data
Monolingual
Data
Training Data
* Monolingual (target only)
data
3 Steps to Improve Quality
10. Build – Building quality training streams
Advantages of clean, high-quality training data
Less correction of errors
Finding cause of errors is
easier
Easy to fill gaps
Faster processing time
Large volume of dirty
make correction difficult
Finding root cause of
problem challenging
Slower training and
processing time
3 Steps to Improve Quality
11. Build – in KantanMT
3 Steps to Improve Quality
13. Measure – KantanMT engine calibration
What to Measure?
BLEU
F-Measure
Word-counts
TER
3 Steps to Improve Quality
14. Measure – KantanMT engine calibration
BLEU Score
Scoring system developed to automate this process of
evaluation
Internationally recognised and most widely used
measure of the quality of your MT engine
The BLEU metric scores a translation on a scale of 0 to
100%
The closer to 100%, the more the translation
correlates to a human translation
AIM: HIGH
3 Steps to Improve Quality
15. Measure – KantanMT engine calibration
F-Measure Score
F-Measure is an automated measurement to
determine the precision and recall capabilities
A general guide to determine the overall quality
performance of an engine
Ratio between recall and precision measurements
Displayed as a percentage value on a scale of 0 to
100%
AIM: HIGH
3 Steps to Improve Quality
16. Measure – KantanMT engine calibration
TER Score
A method to help in predict the post-editing effort
TER is quick to use and correlates highly with actual
post-editing effort
A TER score is a value in the range of 0 to 100%
AIM: LOW
3 Steps to Improve Quality
17. Measure – KantanMT engine calibration
Word-counts
At least 1.5-2.0 million words to build a
predictable, quality KantanMT engine
Less than 2m words - then the engine has to be
used only in a narrow field-domain
Wide field-domain engine – then you would
need in the order of 10-15m words of training
data
3 Steps to Improve Quality
18. Measure – KantanMT engine calibration
Track your scores using KantanWatch™
3 Steps to Improve Quality
24. Learn – KantanMT Experimentation
Running and learning from your first translation job
BLEU
24%
F-Measure
50%
TER
66%
Wordcount
172K
3 Steps to Improve Quality
25. Learn – KantanMT Experimentation
Learn from examining the output
Low
High
Low
Catalog Errors
OK
Untranslated text
Incorrect numeric
formatting
Invalid characters
High level of post-editing
required
Conclusions
Engine coverage is bad
due to low wordcount
Post-Editing is high due
to low engine coverage
Training data doesn’t
contain correct numeric
formatting
Bad formatting in
training data
3 Steps to Improve Quality
26. Learn – KantanMT Experimentation
Learn from examining the output
Low
OK
High
Low
Action Plan
Coverage – More training
data required, relevant and
of high quality. Also use a
Glossary File to improve
terminology consistency
and accuracy.
Numeric Formatting – Use
PEX rule to post-edit
translation and fix numeric
formats
Invalid Character – Use
PEX rule to fix this invalid
character issue
Post-Editing – By
increasing the quantity of
training data the KantanMT
engine will perform better
overall
3 Steps to Improve Quality
27. Build – Action Plan
Action Plan
Coverage – More training
data required, relevant and
of high quality
Post-Editing – By
increasing the quantity of
training data the KantanMT
engine will perform better
overall
3 Steps to Improve Quality
28. Measure – Action Plan
Your latest scores are…
3 Steps to Improve Quality
29. Measure – Action Plan
Results using more relevant, high quality Training Data
BLEU
F-Measure
64%
Excellent
TER
63%
33%
Very Good
Very Good
Wordcount
479K
Good
Previously…
Low
OK
High
Low
3 Steps to Improve Quality
31. Learn/Build – Action Plan
Action Plan
Coverage – Use a Glossary
File to improve
terminology consistency
and accuracy
Numeric Formatting – Use
PEX rule to post-edit
translation and fix numeric
formats
Invalid Character – Use
PEX rule to fix this invalid
character issue
3 Steps to Improve Quality
32. Learn/Build – Action Plan
PEX file
Original output
Action Plan
Coverage – Use a Glossary
File to improve
terminology consistency
and accuracy
Numeric Formatting – Use
PEX rule to post-edit
translation and fix numeric
formats
Invalid Character – Use
PEX rule to fix this invalid
character issue
3 Steps to Improve Quality
33. Build – Measure – Learn – The Results
Analyse output
Untranslated text
Numeric Formatting
Invalid Character
IMPROVED QUALITY
3 Steps to Improve Quality
34. Build – Measure – Learn
Build
Kantan
MT
Engine
Learn
Learn
Human Post-Editing as part
of the Learn step
Take the KantanMT output
Post-Edit it by a Linguist
Re-build the KantanMT
Engine
Rapidly improves Quality of
your KantanMT Engine
Measure
3 Steps to Improve Quality
39. Summary – Build-Measure-Learn
You as a LSP or Language Professional provide:
Extensive Language expertise
Skills to ensure accuracy and precision of your translation
Management / maintenance of TM’s for your clients for use in your CAT
tools
KantanMT provides:
Software and the Hardware to Build your engines
Quality metrics to Measure the quality of your engine
Tools and Process to Learn and then teach your engine
Support and Help
3 Steps to Improve Quality
40. Summary – Build-Measure-Learn
Follow this Build – Measure – Learn process
KantanMT will increase Productivity
Process more words per hour per day
Net result?
Higher Earnings
More Income
Better Margins
3 Steps to Improve Quality
42. Additional information
For additional information please visit:
http://www.kantanmt.com
Contact me at:
Kevin McCoy
E-mail: kevinmcc@kantanmt.com
Mobile: +353 86 823 1527
3 Steps to Improve Quality