SlideShare une entreprise Scribd logo
1  sur  17
Multi-system machine translation
using online APIs for English-Latvian
Matīss Rikters
University of Latvia
ACL 2015 Fourth Workshop on
Hybrid Approaches to Translation
Beijing, 31.07.2015
Introduction
 Motivation:
 Doctoral studies at the University of Latvia
 A hybrid machine translation method, combining results of various machine translation systems
 Literature review
 Recent trends in Multi-System Machine Translation
 Nothing similar publically available was found
Introduction
 Goals:
 Combine output from multiple online MT APIs
 Keep it simple
 Make it work fast
Related work
 "Coupling Statistical Machine Translation with Rule-based Transfer and Generation",
A. Ahsan, and P. Kolachina.
 "Using language and translation models to select the best among outputs from
multiple MT systems", Y. Akiba, T. Watanabe, and E. Sumita.
 "MANY: Open source machine translation system combination", L. Barrault.
 "A program for automatically selecting the best output from multiple machine
translation engines", C. Callison-Burch and R. S. Flournoy.
Initial plan
 Use systems that support English – Latvian translation
 Found five such systems:
What worked
 Couldn`t get APIs of two of them to work
 Used the remaining three:
System description
Sentence tokenization
Translation with APIs
Google Translate Bing Translator LetsMT
Selection of the best
translation
Output
Selection of the best translation
Probabilities are calculated based on the observed entry with longest matching history 𝑤𝑓
𝑛
:
𝑝 𝑤 𝑛 𝑤1
𝑛−1
= 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
𝑖=1
𝑓−1
𝑏(𝑤𝑖
𝑛−1
)
where the probability 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
and backoff penalties 𝑏(𝑤𝑖
𝑛−1
) are given by an already-
estimated language model. Perplexity is then calculated using this probability:
where given an unknown probability distribution p and a proposed probability model q, it
is evaluated by determining how well it predicts a separate test sample x1, x2... xN drawn
from p.
System usage
 Get the code - https://github.com/M4t1ss/Multi-System-Hybrid-Translator
 Get API access
 Google - https://cloud.google.com/translate/
 Bing - http://www.bing.com/dev/en-us/translator
 LetsMT - https://www.letsmt.eu/Integration.aspx
 Add API keys to the configuration
 Prepare a language model
 You can use KenLM – https://kheafield.com/code/kenlm/
 Prepare input data
 Run
 php MSHT.php languageModel.binary inputSentances.txt
Experiments
 MT System APIs
 Google Translate
 Bing Translator
 TB2013 EN-LV v03 from LetsMT
 Language model
 JRC Acquis corpus version 2.2
 Input sentences
 JRC Acquis corpus version 2.2
 ACCURAT balanced test corpus for under resourced languages
Experiment results – JRC Acquis
System BLEU TER WER
Translations selected
Google Bing LetsMT Equal
Google Translate 16.92 47.68 58.55 100 % - - -
Bing Translator 17.16 49.66 58.40 - 100 % - -
LetsMT 28.27 36.19 42.89 - - 100 % -
Hybrid Google + Bing 17.28 48.30 58.15 50.09 % 45.03 % - 4.88 %
Hybrid Google + LetsMT 22.89 41.38 50.31 46.17 % - 48.39 % 5.44 %
Hybrid LetsMT + Bing 22.83 42.92 50.62 - 45.35 % 49.84 % 4.81 %
Hybrid Google + Bing + LetsMT 21.08 44.12 52.99 28.93 % 34.31 % 33.98 % 2.78 %
Experiment results – ACCURAT balanced
System BLEU
Google Translate 24.73
Bing Translator 22.07
LetsMT 32.01
Hybrid Google + Bing 23.75
Hybrid Google + LetsMT 28.94
Hybrid LetsMT + Bing 27.44
Hybrid Google + Bing + LetsMT 26.74
Human evaluation
 5 native Latvian speakers were given a random 2% - 32 sentences
 They were told to mark which of the three MT outputs is the best, worst and OK
 Having the option to select multiple answers for best, worst or OK
Human results
System User 1 User 2 User 3 User 4 User 5 AVG user Hybrid BLEU
Bing 21,88% 53,13% 28,13% 25,00% 31,25% 31,88% 28,93% 16.92
Google 28,13% 25,00% 25,00% 28,13% 46,88% 30,63% 34,31% 17.16
LetsMT 50,00% 21,88% 46,88% 46,88% 21,88% 37,50% 33,98% 28.27
Conclusion
 Simple to
 Build
 Use
 Add new MT APIs
 Works
 When used on similar systems
 Poor with one much superior system
 Needs
 Improvements for translation selection
 More configuration options
Future work
 Use a bigger & better language model?
 Tried it… about the same results
 Confusion networks?
 Too confusing for now
 Use MT quality estimation for selecting the best candidates
 QuEst or QuEst++
 Other quality estimation
 Chunk sentences in smaller parts, translate & recombine
Thank you!
http://ej.uz/MSHT-GITHUB
http://ej.uz/MSMT-EN-LV

Contenu connexe

En vedette (11)

C4.5, C5.0 un SVM klasifikācijas algoritmu izpēte un salīdzināšana datorlingv...
C4.5, C5.0 un SVM klasifikācijas algoritmu izpēte un salīdzināšana datorlingv...C4.5, C5.0 un SVM klasifikācijas algoritmu izpēte un salīdzināšana datorlingv...
C4.5, C5.0 un SVM klasifikācijas algoritmu izpēte un salīdzināšana datorlingv...
 
Powerpoint Template
Powerpoint TemplatePowerpoint Template
Powerpoint Template
 
Modaclub v3
Modaclub v3Modaclub v3
Modaclub v3
 
Makalah asertifitas
Makalah asertifitasMakalah asertifitas
Makalah asertifitas
 
Makalah asa nukleat
Makalah asa nukleatMakalah asa nukleat
Makalah asa nukleat
 
Unidad 3
Unidad 3Unidad 3
Unidad 3
 
Makalah api klpk 1 kls a3 kep
Makalah api klpk 1 kls a3 kepMakalah api klpk 1 kls a3 kep
Makalah api klpk 1 kls a3 kep
 
Filming schedule
Filming scheduleFilming schedule
Filming schedule
 
Google drive d.şahi̇n
Google drive d.şahi̇nGoogle drive d.şahi̇n
Google drive d.şahi̇n
 
Transform: One World
Transform: One WorldTransform: One World
Transform: One World
 
Lição 36 as limitações dos discípulos
Lição 36   as limitações dos discípulosLição 36   as limitações dos discípulos
Lição 36 as limitações dos discípulos
 

Similaire à Multi-system machine translation using online APIs for English-Latvian

Combining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systemsCombining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systemsMatīss ‎‎‎‎‎‎‎  
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...Lifeng (Aaron) Han
 
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechNew Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechTAUS - The Language Data Network
 
Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...PVS-Studio
 
Introduction to genetic programming
Introduction to genetic programmingIntroduction to genetic programming
Introduction to genetic programmingabhishek singh
 
Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approachGarima Nanda
 
White Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build VerificationWhite Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build VerificationPerforce
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_saRobert Martin
 
Eclipse Meets Systems Biology
Eclipse Meets Systems BiologyEclipse Meets Systems Biology
Eclipse Meets Systems BiologyRichard Adams
 
Summarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and TestingSummarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and TestingSebastiano Panichella
 
Software testing using genetic algorithms
Software testing using genetic algorithmsSoftware testing using genetic algorithms
Software testing using genetic algorithmsNurhussen Menza
 
Efficient failure detection and consensus at extreme-scale systems
Efficient failure detection and consensus at extreme-scale  systemsEfficient failure detection and consensus at extreme-scale  systems
Efficient failure detection and consensus at extreme-scale systemsIJECEIAES
 

Similaire à Multi-system machine translation using online APIs for English-Latvian (20)

K translate - Baltic DBIS2016
K translate - Baltic DBIS2016K translate - Baltic DBIS2016
K translate - Baltic DBIS2016
 
Combining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systemsCombining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systems
 
Searching for the best translation combination
Searching for the best translation combinationSearching for the best translation combination
Searching for the best translation combination
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
 
Doktorantūras semināra 3. prezentācija
Doktorantūras semināra 3. prezentācijaDoktorantūras semināra 3. prezentācija
Doktorantūras semināra 3. prezentācija
 
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechNew Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
 
Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...
 
C2-4-Putchala
C2-4-PutchalaC2-4-Putchala
C2-4-Putchala
 
Introduction to genetic programming
Introduction to genetic programmingIntroduction to genetic programming
Introduction to genetic programming
 
Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approach
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
 
White Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build VerificationWhite Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build Verification
 
Final
FinalFinal
Final
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_sa
 
Eclipse Meets Systems Biology
Eclipse Meets Systems BiologyEclipse Meets Systems Biology
Eclipse Meets Systems Biology
 
Poster (1)
Poster (1)Poster (1)
Poster (1)
 
Summarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and TestingSummarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and Testing
 
Software testing using genetic algorithms
Software testing using genetic algorithmsSoftware testing using genetic algorithms
Software testing using genetic algorithms
 
Proposal with sdlc
Proposal with sdlcProposal with sdlc
Proposal with sdlc
 
Efficient failure detection and consensus at extreme-scale systems
Efficient failure detection and consensus at extreme-scale  systemsEfficient failure detection and consensus at extreme-scale  systems
Efficient failure detection and consensus at extreme-scale systems
 

Plus de Matīss ‎‎‎‎‎‎‎  

Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsMatīss ‎‎‎‎‎‎‎  
 
Effective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationEffective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationMatīss ‎‎‎‎‎‎‎  
 
Hybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsHybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsMatīss ‎‎‎‎‎‎‎  
 

Plus de Matīss ‎‎‎‎‎‎‎   (20)

日本のお風呂
日本のお風呂日本のお風呂
日本のお風呂
 
Thrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy DayThrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy Day
 
私の趣味
私の趣味私の趣味
私の趣味
 
How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?
 
私の町リガ
私の町リガ私の町リガ
私の町リガ
 
大学への交通手段
大学への交通手段大学への交通手段
大学への交通手段
 
小学生に 携帯電話
小学生に 携帯電話小学生に 携帯電話
小学生に 携帯電話
 
Tracing multisensory food experience on twitter
Tracing multisensory food experience on twitterTracing multisensory food experience on twitter
Tracing multisensory food experience on twitter
 
ラトビア大学
ラトビア大学ラトビア大学
ラトビア大学
 
私の趣味
私の趣味私の趣味
私の趣味
 
富士山りょこう
富士山りょこう富士山りょこう
富士山りょこう
 
Tips and Tools for NMT
Tips and Tools for NMTTips and Tools for NMT
Tips and Tools for NMT
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
 
The Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine TranslationThe Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine Translation
 
Advancing Estonian Machine Translation
Advancing Estonian Machine TranslationAdvancing Estonian Machine Translation
Advancing Estonian Machine Translation
 
Debugging neural machine translations
Debugging neural machine translationsDebugging neural machine translations
Debugging neural machine translations
 
Effective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationEffective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translation
 
Neirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošanaNeirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošana
 
Hybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsHybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systems
 
Paying attention to MWEs in NMT
Paying attention to MWEs in NMTPaying attention to MWEs in NMT
Paying attention to MWEs in NMT
 

Dernier

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 

Dernier (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Multi-system machine translation using online APIs for English-Latvian

  • 1. Multi-system machine translation using online APIs for English-Latvian Matīss Rikters University of Latvia ACL 2015 Fourth Workshop on Hybrid Approaches to Translation Beijing, 31.07.2015
  • 2. Introduction  Motivation:  Doctoral studies at the University of Latvia  A hybrid machine translation method, combining results of various machine translation systems  Literature review  Recent trends in Multi-System Machine Translation  Nothing similar publically available was found
  • 3. Introduction  Goals:  Combine output from multiple online MT APIs  Keep it simple  Make it work fast
  • 4. Related work  "Coupling Statistical Machine Translation with Rule-based Transfer and Generation", A. Ahsan, and P. Kolachina.  "Using language and translation models to select the best among outputs from multiple MT systems", Y. Akiba, T. Watanabe, and E. Sumita.  "MANY: Open source machine translation system combination", L. Barrault.  "A program for automatically selecting the best output from multiple machine translation engines", C. Callison-Burch and R. S. Flournoy.
  • 5. Initial plan  Use systems that support English – Latvian translation  Found five such systems:
  • 6. What worked  Couldn`t get APIs of two of them to work  Used the remaining three:
  • 7. System description Sentence tokenization Translation with APIs Google Translate Bing Translator LetsMT Selection of the best translation Output
  • 8. Selection of the best translation Probabilities are calculated based on the observed entry with longest matching history 𝑤𝑓 𝑛 : 𝑝 𝑤 𝑛 𝑤1 𝑛−1 = 𝑝 𝑤 𝑛 𝑤𝑓 𝑛−1 𝑖=1 𝑓−1 𝑏(𝑤𝑖 𝑛−1 ) where the probability 𝑝 𝑤 𝑛 𝑤𝑓 𝑛−1 and backoff penalties 𝑏(𝑤𝑖 𝑛−1 ) are given by an already- estimated language model. Perplexity is then calculated using this probability: where given an unknown probability distribution p and a proposed probability model q, it is evaluated by determining how well it predicts a separate test sample x1, x2... xN drawn from p.
  • 9. System usage  Get the code - https://github.com/M4t1ss/Multi-System-Hybrid-Translator  Get API access  Google - https://cloud.google.com/translate/  Bing - http://www.bing.com/dev/en-us/translator  LetsMT - https://www.letsmt.eu/Integration.aspx  Add API keys to the configuration  Prepare a language model  You can use KenLM – https://kheafield.com/code/kenlm/  Prepare input data  Run  php MSHT.php languageModel.binary inputSentances.txt
  • 10. Experiments  MT System APIs  Google Translate  Bing Translator  TB2013 EN-LV v03 from LetsMT  Language model  JRC Acquis corpus version 2.2  Input sentences  JRC Acquis corpus version 2.2  ACCURAT balanced test corpus for under resourced languages
  • 11. Experiment results – JRC Acquis System BLEU TER WER Translations selected Google Bing LetsMT Equal Google Translate 16.92 47.68 58.55 100 % - - - Bing Translator 17.16 49.66 58.40 - 100 % - - LetsMT 28.27 36.19 42.89 - - 100 % - Hybrid Google + Bing 17.28 48.30 58.15 50.09 % 45.03 % - 4.88 % Hybrid Google + LetsMT 22.89 41.38 50.31 46.17 % - 48.39 % 5.44 % Hybrid LetsMT + Bing 22.83 42.92 50.62 - 45.35 % 49.84 % 4.81 % Hybrid Google + Bing + LetsMT 21.08 44.12 52.99 28.93 % 34.31 % 33.98 % 2.78 %
  • 12. Experiment results – ACCURAT balanced System BLEU Google Translate 24.73 Bing Translator 22.07 LetsMT 32.01 Hybrid Google + Bing 23.75 Hybrid Google + LetsMT 28.94 Hybrid LetsMT + Bing 27.44 Hybrid Google + Bing + LetsMT 26.74
  • 13. Human evaluation  5 native Latvian speakers were given a random 2% - 32 sentences  They were told to mark which of the three MT outputs is the best, worst and OK  Having the option to select multiple answers for best, worst or OK
  • 14. Human results System User 1 User 2 User 3 User 4 User 5 AVG user Hybrid BLEU Bing 21,88% 53,13% 28,13% 25,00% 31,25% 31,88% 28,93% 16.92 Google 28,13% 25,00% 25,00% 28,13% 46,88% 30,63% 34,31% 17.16 LetsMT 50,00% 21,88% 46,88% 46,88% 21,88% 37,50% 33,98% 28.27
  • 15. Conclusion  Simple to  Build  Use  Add new MT APIs  Works  When used on similar systems  Poor with one much superior system  Needs  Improvements for translation selection  More configuration options
  • 16. Future work  Use a bigger & better language model?  Tried it… about the same results  Confusion networks?  Too confusing for now  Use MT quality estimation for selecting the best candidates  QuEst or QuEst++  Other quality estimation  Chunk sentences in smaller parts, translate & recombine