SlideShare une entreprise Scribd logo
1  sur  44
Télécharger pour lire hors ligne
How to
Successfully Integrate
Machine Translation
in your Company
Diego Bartolome
@diegobartolome
dbc@tauyou.com
and others
70+ clients
18 countries
~700 Million words in 2014
All language pairs
performance demanded
in high end markets
performance demanded
in low end markets
sustaining technology
disruptive technology
Objectives for Machine Translation
Productivity gains
Direct cost reduction
Quality consistency
New uses for Machine Translation
Multilingual customer support
Social Media monitoring
Applications enabled by Big Data
Internet of Everything /Internet of Things
Speech-to-Speech translation
Questions: First Round
What is your experience with MT?
1. Quality Metrics
2. Cost reduction
3. Impact on Delivery Times
4. Feedback from Post-editors
5. Your Feelings
Learning about Machine Translation
https://www.taus.net/think-
tank/reports/translate-reports/taus-
translation-technology-landscape-report
https://www.taus.net/think-
tank/reports/translate-reports/moses-mt-
market-report
http://www.lt-
innovate.eu/resources/document/lt-20-13
http://www.gala-global.org/onDemand
Machine Translation Types
Google/Bing Translator vs. Moses
Advantages
Big(gger) data
State-of-the-art technology
Learning curve
Disadvantages
Black-box
Confidentiality
Control
Internal vs. external
Core competence
Resources
ROI
Time to market
Costs of Machine Translation
Internal development – people and time
Free tools – Google + Bing
DOiY solutions
Traditional pricing model
tauyou managed solution
Revenue from Machine Translation
Translation as a Service
Private Machine Translation Portal
MT of internal communication (flat rate)
….
and many others!
Questions: Round 2
1. Where do you provide value now?
2. Where do you think the value will be?
3. How important is confidentiality?
4. Do you care about control?
5. How much could you invest on MT?
(time, people, money)
6. When will your solution be available?
On Language Quality (I)
Source: translate.autodesk.com
On Language Quality (II)
Source: Philipp Koehn
Some Languages Sorted
From EN into
1) FR, ES, PT, IT
2) DE, NL, HE
3) ZH, JA, KR
4) RU, AR, TR, HI
On Domain Quality
Who is willing to pay?
Where does your revenue come from?
What are your key skills?
What domains achieve good quality?
… Quality Order of your domains ...
Questions: Round 3
1. What is your main motivation?
2. Can you try more than 1 domain?
3. Can you train at least 2 language pairs?
4. Can you pilot several MT vendors?
5. What are your current expectations?
Data acquisition
OPUS corpora
http://opus.lingfil.uu.se/
WMT workshops
e.g. http://www.statmt.org/wmt13/
Multilingual websites
TAUS
Corpora building
Related vs. unrelated materials
Percentage of out-of-domain
Does mono-lingual data help?
Corpora extension with linguistic processing
Ad-hoc corpus for file translation
The more, the better?
Data cleaning
Clean translation memories
Length, punctuation, terminology, …
Inconsistencies, repetitions, ...
Segment splitting
Optimize weight of most frequent n-grams
Validate their translations
Add out-of-domain data (optimization)
Remark
Data cleaning and selection is a key process
Just more data may harm the quality
Training strategies
One single system with all TMs
+ glossaries
+ linguistic processing input/output
+ forbidden words lists
Layered approach
Generic domain subdomain client→ → →
Models optimization
Filter the translation tables
Remove the garbage + tune weights
Optimize language models
Adapt them to the translation purpose
Tune parameters correctly
Tune set, test set, optimization parameters
Improve tokenization, recasing, ...
Workflow integration
Use MT as a secondary TM
Bilingual pre-translated translation files
CAT tool integration
Differentiated workflow
Continuous improvement
Qualitative
Use updated TMs in new trainings
Immediate (incremental) retraining
Rule-based automatic post-editing
Selective pre- and/or post-processing
Source content optimization
Linguistic processing notes
In the source and/or target language
Grammar checking
Entities detection
Proper nouns, alphanumeric words, ...
Compound words splitting
Sentence reordering
Questions: Round 4
What is your preferred option?
How much can you invest in improvements?
The Post-editor profile
Do skills needed differ from translation?
Post-editing guidelines (TAUS)
Full vs. light post-editing
http://www.slideshare.net/TAUS/taus-
mt-postediting-guidelines
Compensation
Questions: Round 5
Do you have the right resources to start?
Quality Metrics
SMT metrics: BLEU, NIST
Feedback from translators
Translation time vs. Post-editing time
Word Error Rate (WER) or Edit Distance
Cost reduction
Questions: Round 6
Are you able to measure?
Once upon an industry ...
Change
before you
have to
Jack Welch

Contenu connexe

Similaire à Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLoriThicke
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16kantanmt
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyIconic Translation Machines
 
Good Applications of Bad Machine Translation
Good Applications of Bad Machine TranslationGood Applications of Bad Machine Translation
Good Applications of Bad Machine Translationbdonaldson
 
Managing Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive TranslationManaging Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive TranslationPoulomi Choudhury
 
Overcoming the Language Barrier: Considering Translation
Overcoming the Language Barrier: Considering TranslationOvercoming the Language Barrier: Considering Translation
Overcoming the Language Barrier: Considering TranslationRyan Coleman
 
Presentation at CEF-EU-Luxembourg
Presentation at CEF-EU-LuxembourgPresentation at CEF-EU-Luxembourg
Presentation at CEF-EU-LuxembourgManuel Herranz
 
Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...SDL
 
Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones RIILP
 
MiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit MichiganMiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit MichiganKirti Vashee
 
FIPOTranslations - Who Need Them and How LE technologies Can Help, Henry Wang...
FIPOTranslations - Who Need Them and How LE technologies Can Help, Henry Wang...FIPOTranslations - Who Need Them and How LE technologies Can Help, Henry Wang...
FIPOTranslations - Who Need Them and How LE technologies Can Help, Henry Wang...TAUS - The Language Data Network
 
What? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projectsWhat? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projectsJohn Tinsley
 
Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyIconic Translation Machines
 
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...Welocalize
 
Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...kantanmt
 
What you need to put Machine Translation into practice: Tools, People, and Pr...
What you need to put Machine Translation into practice: Tools, People, and Pr...What you need to put Machine Translation into practice: Tools, People, and Pr...
What you need to put Machine Translation into practice: Tools, People, and Pr...tauyou
 
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...TAUS - The Language Data Network
 
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinarkantanmt
 

Similaire à Machine Translation Master Class at the EUATC Conference by Diego Bartolome (20)

Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking Compromises
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happy
 
Good Applications of Bad Machine Translation
Good Applications of Bad Machine TranslationGood Applications of Bad Machine Translation
Good Applications of Bad Machine Translation
 
Managing Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive TranslationManaging Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive Translation
 
Overcoming the Language Barrier: Considering Translation
Overcoming the Language Barrier: Considering TranslationOvercoming the Language Barrier: Considering Translation
Overcoming the Language Barrier: Considering Translation
 
Presentation at CEF-EU-Luxembourg
Presentation at CEF-EU-LuxembourgPresentation at CEF-EU-Luxembourg
Presentation at CEF-EU-Luxembourg
 
Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...
 
Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones
 
MiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit MichiganMiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit Michigan
 
FIPOTranslations - Who Need Them and How LE technologies Can Help, Henry Wang...
FIPOTranslations - Who Need Them and How LE technologies Can Help, Henry Wang...FIPOTranslations - Who Need Them and How LE technologies Can Help, Henry Wang...
FIPOTranslations - Who Need Them and How LE technologies Can Help, Henry Wang...
 
What? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projectsWhat? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projects
 
Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case Study
 
Insights in the MT Market, by Jaap van der Meer, TAUS
Insights in the MT Market, by Jaap van der Meer, TAUSInsights in the MT Market, by Jaap van der Meer, TAUS
Insights in the MT Market, by Jaap van der Meer, TAUS
 
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
 
Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...
 
What you need to put Machine Translation into practice: Tools, People, and Pr...
What you need to put Machine Translation into practice: Tools, People, and Pr...What you need to put Machine Translation into practice: Tools, People, and Pr...
What you need to put Machine Translation into practice: Tools, People, and Pr...
 
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
 
Smt & data quality
Smt & data qualitySmt & data quality
Smt & data quality
 
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
 

Plus de tauyou

Artificial Intelligence and Machine Learning found in Translation
Artificial Intelligence and Machine Learning found in TranslationArtificial Intelligence and Machine Learning found in Translation
Artificial Intelligence and Machine Learning found in Translationtauyou
 
From the Lab to the Market
From the Lab to the MarketFrom the Lab to the Market
From the Lab to the Markettauyou
 
APIfying the Translation Industry
APIfying the Translation IndustryAPIfying the Translation Industry
APIfying the Translation Industrytauyou
 
The Discreet Charm of Machine Translation
The Discreet Charm of Machine TranslationThe Discreet Charm of Machine Translation
The Discreet Charm of Machine Translationtauyou
 
Women in Localization UK Webinar with Diego Bartolome
Women in Localization UK Webinar with Diego BartolomeWomen in Localization UK Webinar with Diego Bartolome
Women in Localization UK Webinar with Diego Bartolometauyou
 
TAUS Post-editing webinar. Spanish-to-English Module
TAUS Post-editing webinar. Spanish-to-English ModuleTAUS Post-editing webinar. Spanish-to-English Module
TAUS Post-editing webinar. Spanish-to-English Moduletauyou
 
The Beauty of Machine Translation
The Beauty of Machine TranslationThe Beauty of Machine Translation
The Beauty of Machine Translationtauyou
 
Emerging Technologies Enabling New Business Models
Emerging Technologies Enabling New Business ModelsEmerging Technologies Enabling New Business Models
Emerging Technologies Enabling New Business Modelstauyou
 
Innovating in Translation
Innovating in TranslationInnovating in Translation
Innovating in Translationtauyou
 
Pushing Machine Translation Forward
Pushing Machine Translation ForwardPushing Machine Translation Forward
Pushing Machine Translation Forwardtauyou
 
The State of Post-Editing
The State of Post-EditingThe State of Post-Editing
The State of Post-Editingtauyou
 
lo que he aprendido (y quiero compartir)
lo que he aprendido (y quiero compartir)lo que he aprendido (y quiero compartir)
lo que he aprendido (y quiero compartir)tauyou
 
Learn to Innovate (GALA Istanbul 2014)
Learn to Innovate (GALA Istanbul 2014)Learn to Innovate (GALA Istanbul 2014)
Learn to Innovate (GALA Istanbul 2014)tauyou
 
Entrepreneurship in Education
Entrepreneurship in EducationEntrepreneurship in Education
Entrepreneurship in Educationtauyou
 
2013 UAB Barcelona: Change the world (one start-up at a time)
2013 UAB Barcelona: Change the world (one start-up at a time)2013 UAB Barcelona: Change the world (one start-up at a time)
2013 UAB Barcelona: Change the world (one start-up at a time)tauyou
 
2013 Tekom Wiesbaden: A Business Model Generation Session
2013 Tekom Wiesbaden: A Business Model Generation Session2013 Tekom Wiesbaden: A Business Model Generation Session
2013 Tekom Wiesbaden: A Business Model Generation Sessiontauyou
 
2013 ATC Conference London: New Business Models for the Translation Industry
2013 ATC Conference London: New Business Models for the Translation Industry2013 ATC Conference London: New Business Models for the Translation Industry
2013 ATC Conference London: New Business Models for the Translation Industrytauyou
 
2013 TMS Inspiration Days Krakow: A Business Model Generation Session
2013 TMS Inspiration Days Krakow: A Business Model Generation Session2013 TMS Inspiration Days Krakow: A Business Model Generation Session
2013 TMS Inspiration Days Krakow: A Business Model Generation Sessiontauyou
 
2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budget
2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budget2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budget
2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budgettauyou
 
2012 Traducción Automática para LSPs
2012 Traducción Automática para LSPs2012 Traducción Automática para LSPs
2012 Traducción Automática para LSPstauyou
 

Plus de tauyou (20)

Artificial Intelligence and Machine Learning found in Translation
Artificial Intelligence and Machine Learning found in TranslationArtificial Intelligence and Machine Learning found in Translation
Artificial Intelligence and Machine Learning found in Translation
 
From the Lab to the Market
From the Lab to the MarketFrom the Lab to the Market
From the Lab to the Market
 
APIfying the Translation Industry
APIfying the Translation IndustryAPIfying the Translation Industry
APIfying the Translation Industry
 
The Discreet Charm of Machine Translation
The Discreet Charm of Machine TranslationThe Discreet Charm of Machine Translation
The Discreet Charm of Machine Translation
 
Women in Localization UK Webinar with Diego Bartolome
Women in Localization UK Webinar with Diego BartolomeWomen in Localization UK Webinar with Diego Bartolome
Women in Localization UK Webinar with Diego Bartolome
 
TAUS Post-editing webinar. Spanish-to-English Module
TAUS Post-editing webinar. Spanish-to-English ModuleTAUS Post-editing webinar. Spanish-to-English Module
TAUS Post-editing webinar. Spanish-to-English Module
 
The Beauty of Machine Translation
The Beauty of Machine TranslationThe Beauty of Machine Translation
The Beauty of Machine Translation
 
Emerging Technologies Enabling New Business Models
Emerging Technologies Enabling New Business ModelsEmerging Technologies Enabling New Business Models
Emerging Technologies Enabling New Business Models
 
Innovating in Translation
Innovating in TranslationInnovating in Translation
Innovating in Translation
 
Pushing Machine Translation Forward
Pushing Machine Translation ForwardPushing Machine Translation Forward
Pushing Machine Translation Forward
 
The State of Post-Editing
The State of Post-EditingThe State of Post-Editing
The State of Post-Editing
 
lo que he aprendido (y quiero compartir)
lo que he aprendido (y quiero compartir)lo que he aprendido (y quiero compartir)
lo que he aprendido (y quiero compartir)
 
Learn to Innovate (GALA Istanbul 2014)
Learn to Innovate (GALA Istanbul 2014)Learn to Innovate (GALA Istanbul 2014)
Learn to Innovate (GALA Istanbul 2014)
 
Entrepreneurship in Education
Entrepreneurship in EducationEntrepreneurship in Education
Entrepreneurship in Education
 
2013 UAB Barcelona: Change the world (one start-up at a time)
2013 UAB Barcelona: Change the world (one start-up at a time)2013 UAB Barcelona: Change the world (one start-up at a time)
2013 UAB Barcelona: Change the world (one start-up at a time)
 
2013 Tekom Wiesbaden: A Business Model Generation Session
2013 Tekom Wiesbaden: A Business Model Generation Session2013 Tekom Wiesbaden: A Business Model Generation Session
2013 Tekom Wiesbaden: A Business Model Generation Session
 
2013 ATC Conference London: New Business Models for the Translation Industry
2013 ATC Conference London: New Business Models for the Translation Industry2013 ATC Conference London: New Business Models for the Translation Industry
2013 ATC Conference London: New Business Models for the Translation Industry
 
2013 TMS Inspiration Days Krakow: A Business Model Generation Session
2013 TMS Inspiration Days Krakow: A Business Model Generation Session2013 TMS Inspiration Days Krakow: A Business Model Generation Session
2013 TMS Inspiration Days Krakow: A Business Model Generation Session
 
2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budget
2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budget2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budget
2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budget
 
2012 Traducción Automática para LSPs
2012 Traducción Automática para LSPs2012 Traducción Automática para LSPs
2012 Traducción Automática para LSPs
 

Dernier

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 

Dernier (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

Machine Translation Master Class at the EUATC Conference by Diego Bartolome

  • 1. How to Successfully Integrate Machine Translation in your Company Diego Bartolome @diegobartolome dbc@tauyou.com
  • 3. 70+ clients 18 countries ~700 Million words in 2014 All language pairs
  • 4.
  • 5.
  • 6. performance demanded in high end markets performance demanded in low end markets sustaining technology disruptive technology
  • 7. Objectives for Machine Translation Productivity gains Direct cost reduction Quality consistency
  • 8. New uses for Machine Translation Multilingual customer support Social Media monitoring Applications enabled by Big Data Internet of Everything /Internet of Things Speech-to-Speech translation
  • 9. Questions: First Round What is your experience with MT? 1. Quality Metrics 2. Cost reduction 3. Impact on Delivery Times 4. Feedback from Post-editors 5. Your Feelings
  • 10. Learning about Machine Translation https://www.taus.net/think- tank/reports/translate-reports/taus- translation-technology-landscape-report https://www.taus.net/think- tank/reports/translate-reports/moses-mt- market-report http://www.lt- innovate.eu/resources/document/lt-20-13 http://www.gala-global.org/onDemand
  • 12. Google/Bing Translator vs. Moses Advantages Big(gger) data State-of-the-art technology Learning curve Disadvantages Black-box Confidentiality Control
  • 13. Internal vs. external Core competence Resources ROI Time to market
  • 14. Costs of Machine Translation Internal development – people and time Free tools – Google + Bing DOiY solutions Traditional pricing model tauyou managed solution
  • 15. Revenue from Machine Translation Translation as a Service Private Machine Translation Portal MT of internal communication (flat rate) …. and many others!
  • 16. Questions: Round 2 1. Where do you provide value now? 2. Where do you think the value will be? 3. How important is confidentiality? 4. Do you care about control? 5. How much could you invest on MT? (time, people, money) 6. When will your solution be available?
  • 17. On Language Quality (I) Source: translate.autodesk.com
  • 18. On Language Quality (II) Source: Philipp Koehn
  • 19. Some Languages Sorted From EN into 1) FR, ES, PT, IT 2) DE, NL, HE 3) ZH, JA, KR 4) RU, AR, TR, HI
  • 20. On Domain Quality Who is willing to pay? Where does your revenue come from? What are your key skills? What domains achieve good quality? … Quality Order of your domains ...
  • 21. Questions: Round 3 1. What is your main motivation? 2. Can you try more than 1 domain? 3. Can you train at least 2 language pairs? 4. Can you pilot several MT vendors? 5. What are your current expectations?
  • 22. Data acquisition OPUS corpora http://opus.lingfil.uu.se/ WMT workshops e.g. http://www.statmt.org/wmt13/ Multilingual websites TAUS
  • 23. Corpora building Related vs. unrelated materials Percentage of out-of-domain Does mono-lingual data help? Corpora extension with linguistic processing Ad-hoc corpus for file translation The more, the better?
  • 24. Data cleaning Clean translation memories Length, punctuation, terminology, … Inconsistencies, repetitions, ... Segment splitting Optimize weight of most frequent n-grams Validate their translations Add out-of-domain data (optimization)
  • 25. Remark Data cleaning and selection is a key process Just more data may harm the quality
  • 26. Training strategies One single system with all TMs + glossaries + linguistic processing input/output + forbidden words lists Layered approach Generic domain subdomain client→ → →
  • 27. Models optimization Filter the translation tables Remove the garbage + tune weights Optimize language models Adapt them to the translation purpose Tune parameters correctly Tune set, test set, optimization parameters Improve tokenization, recasing, ...
  • 28. Workflow integration Use MT as a secondary TM Bilingual pre-translated translation files CAT tool integration Differentiated workflow
  • 29. Continuous improvement Qualitative Use updated TMs in new trainings Immediate (incremental) retraining Rule-based automatic post-editing Selective pre- and/or post-processing Source content optimization
  • 30. Linguistic processing notes In the source and/or target language Grammar checking Entities detection Proper nouns, alphanumeric words, ... Compound words splitting Sentence reordering
  • 31. Questions: Round 4 What is your preferred option? How much can you invest in improvements?
  • 32. The Post-editor profile Do skills needed differ from translation? Post-editing guidelines (TAUS) Full vs. light post-editing http://www.slideshare.net/TAUS/taus- mt-postediting-guidelines Compensation
  • 33. Questions: Round 5 Do you have the right resources to start?
  • 34. Quality Metrics SMT metrics: BLEU, NIST Feedback from translators Translation time vs. Post-editing time Word Error Rate (WER) or Edit Distance Cost reduction
  • 35. Questions: Round 6 Are you able to measure?
  • 36.
  • 37. Once upon an industry ...
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.