Machine translation enables new business models to create new revenue sources for your business. However, integrating it into your workflow might be challenging. In this Master Class, Diego Bartolome will cover the most important aspects and lessons learned during the past seven years, which include technology, people, and processes.
6. performance demanded
in high end markets
performance demanded
in low end markets
sustaining technology
disruptive technology
7. Objectives for Machine Translation
Productivity gains
Direct cost reduction
Quality consistency
8. New uses for Machine Translation
Multilingual customer support
Social Media monitoring
Applications enabled by Big Data
Internet of Everything /Internet of Things
Speech-to-Speech translation
9. Questions: First Round
What is your experience with MT?
1. Quality Metrics
2. Cost reduction
3. Impact on Delivery Times
4. Feedback from Post-editors
5. Your Feelings
14. Costs of Machine Translation
Internal development – people and time
Free tools – Google + Bing
DOiY solutions
Traditional pricing model
tauyou managed solution
15. Revenue from Machine Translation
Translation as a Service
Private Machine Translation Portal
MT of internal communication (flat rate)
….
and many others!
16. Questions: Round 2
1. Where do you provide value now?
2. Where do you think the value will be?
3. How important is confidentiality?
4. Do you care about control?
5. How much could you invest on MT?
(time, people, money)
6. When will your solution be available?
20. On Domain Quality
Who is willing to pay?
Where does your revenue come from?
What are your key skills?
What domains achieve good quality?
… Quality Order of your domains ...
21. Questions: Round 3
1. What is your main motivation?
2. Can you try more than 1 domain?
3. Can you train at least 2 language pairs?
4. Can you pilot several MT vendors?
5. What are your current expectations?
23. Corpora building
Related vs. unrelated materials
Percentage of out-of-domain
Does mono-lingual data help?
Corpora extension with linguistic processing
Ad-hoc corpus for file translation
The more, the better?
24. Data cleaning
Clean translation memories
Length, punctuation, terminology, …
Inconsistencies, repetitions, ...
Segment splitting
Optimize weight of most frequent n-grams
Validate their translations
Add out-of-domain data (optimization)
26. Training strategies
One single system with all TMs
+ glossaries
+ linguistic processing input/output
+ forbidden words lists
Layered approach
Generic domain subdomain client→ → →
27. Models optimization
Filter the translation tables
Remove the garbage + tune weights
Optimize language models
Adapt them to the translation purpose
Tune parameters correctly
Tune set, test set, optimization parameters
Improve tokenization, recasing, ...
28. Workflow integration
Use MT as a secondary TM
Bilingual pre-translated translation files
CAT tool integration
Differentiated workflow
29. Continuous improvement
Qualitative
Use updated TMs in new trainings
Immediate (incremental) retraining
Rule-based automatic post-editing
Selective pre- and/or post-processing
Source content optimization
30. Linguistic processing notes
In the source and/or target language
Grammar checking
Entities detection
Proper nouns, alphanumeric words, ...
Compound words splitting
Sentence reordering
32. The Post-editor profile
Do skills needed differ from translation?
Post-editing guidelines (TAUS)
Full vs. light post-editing
http://www.slideshare.net/TAUS/taus-
mt-postediting-guidelines
Compensation
34. Quality Metrics
SMT metrics: BLEU, NIST
Feedback from translators
Translation time vs. Post-editing time
Word Error Rate (WER) or Edit Distance
Cost reduction