SlideShare une entreprise Scribd logo
1  sur  60
Tenth MT Marathon 2015
Prague, Czech Republic
September 7-12, 2015
Lectures
MT Evaluation
Yvette Graham (DCU)
http://ufal.mff.cuni.cz/mtm15/files/01-mt-evaluation-yvette-graham.pdf
MT Evaluation
Yvette Graham (DCU)
MT Evaluation
Yvette Graham (DCU)
MT Evaluation
Yvette Graham (DCU)
Introduction to Machine Translation and
Phrase-Based Machine Translation
http://ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf
Aleš Tamchyna (UFAL)
MT Talks
Ondřej Bojar
• http://mttalks.ufal.cz/
• Mini-lectures on MT
• Coding Exercises that complement the lectures
MT Talks
Ondřej Bojar
• Intro: Why is MT difficult, approaches to MT.
• MT that Deceives: Serious translation errors even for short and simple inputs.
• Pre-processing: Normalization and other technical tricks bound to help your MT system.
• MT Evaluation in General: Techniques of judging MT quality, dimensions of translation quality, number of possible translations.
• Automatic MT Evaluation: Two common automatic MT evaluation methods: PER and BLEU
• Data Acquisition: The need and possible sources of training data for MT, the diminishing utility of the new data additions due to Zipf's
law.
• Sentence Alignment: An introduction to the Gale & Church sentence alignment algorithm.
• Word Alignment: Cutting the chicken-egg problem.
• Phrase-based Model: Copy if you can.
• Constituency Trees: Divide and conquer.
• Dependency Trees: Trees with gaps.
• Rich Vocabulary: Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz.
• Scoring and Optimization: Features your model features.
Language Modelling
Kenneth Heafield (University of Edinburgh)
http://ufal.mff.cuni.cz/mtm15/files/09-language-modelling-kenneth-heafield.pdf
LM = 50% of CPU
Discriminative Training
Miloš Stanojević (ILLC, University of Amsterdam)
http://ufal.mff.cuni.cz/mtm15/files/10-discriminative-training-milos-stanojevic.pdf
Deep Syntactic MT and TectoMT
Martin Popel (UFAL)
http://ufal.mff.cuni.cz/mtm15/files/12-deep-syntactic-mt-and-tectomt-martin-popel.pdf
Deep Syntactic MT and TectoMT
• 1.2s per sentence
• Worst in WMT 2015
• Depfix can detect & fix negation,
mostly tries to fix morphological agreement
• Originally CS-EN but within QTLeap adapted to
CS-EN, EN-ES, EN-NL, EN-PT, EN-EU
• 67% errors from transfer, 30% from analysis
Martin Popel (UFAL)
http://ufal.mff.cuni.cz/mtm15/files/12-deep-syntactic-mt-and-tectomt-martin-popel.pdf
Syntax-Based Models and Decoding
http://ufal.mff.cuni.cz/mtm15/files/19a-syntax-based-models-hieu-hoang.pdf
http://ufal.mff.cuni.cz/mtm15/files/19b-cyk-hieu-hoang.pdf
Hieu Hoang (New York University, Abu Dhabi)
Keynotes
Real-World Application of an Machine
Translation Workflow
• Cost is not the most important
driver - it is speed / shorter
turnaround time
• Pricing is part of the business
relationship. MT usage is only one
of many driving factors.
Real-World Application of an Machine
Translation Workflow
• Cost is not the most important
driver - it is speed / shorter
turnaround time
• Pricing is part of the business
relationship. MT usage is only one
of many driving factors.
Neural Network Models and Google
TranslateKeith Stevens (Google)
http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
Neural Network Models and Google
TranslateKeith Stevens (Google)
http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
Neural Network Models and Google
TranslateKeith Stevens (Google)
http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
Neural Network Models and Google
TranslateKeith Stevens (Google)
http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
Neural Network Models and Google
TranslateKeith Stevens (Google)
http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
Neural Network Models and Google
TranslateKeith Stevens (Google)
http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
Neural Network Models and Google
TranslateKeith Stevens (Google)
http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
Neural Network Models and Google
TranslateKeith Stevens (Google)
http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
Text Representations for NLP and MT
Hinrich Schütze
• Reduce sparseness with morphological analysis for better machine
translation
• MarMoT - A fast and accurate morphological tagger -
http://cistern.cis.lmu.de/marmot/
Text Representations for NLP and MT
Hinrich Schütze
• Use embeddings for lemmata, not for word forms
• Embeddings and morphological resources provide complementary
information - use both!
Text Representations for NLP and MT
Hinrich Schütze
Text Representations for NLP and MT
Hinrich Schütze
•Use lemmata for MT
•Use embeddings for MT
•Use linguistic morphological resources for MT
•Don’t represent sentences as vectors for MT
•Deep learning will not replace other MT work . . .
•. . . but will be a powerful component of MT systems.
Text Representations for NLP and MT
Hinrich Schütze
Labs
translate5
http://www.translate5.net/login
http://ufal.mff.cuni.cz/mtm15/files/03-translate5-lab-marc-mittag.pdf
Column-based approach on data
eman
What is eman?
• A tool for managing pipelines of steps.
• Purpose independent, but bundled with an ecosystem of tools for machine translation.
• Written in Perl 5, runs on Linux (and probably other Unices).
• Tasks submitted locally or using SGE cluster.
Key Features
• Create complex experiment pipelines.
• Clone whole experiments or individual steps.
• Re-use existing steps when possible.
• Automaticaly resolve complex step dependencies.
• Seamlessly share steps with others.
• Generate tables of results based on customizable rules.
• Easily scriptable and hacking friendly.
https://ufal.mff.cuni.cz/eman/
Box — Moses Suite on Amazon EC2
Here's what Box v2015-05-25 beta (current release) includes:
cdec/ Popular SMT framework: http://www.cdec-decoder.org
cmph/ Hashing library (for compact phrase tables in Moses): http://cmph.sourceforge.net
ducttape/ Experiment management system (for cdec): https://github.com/jhclark/ducttape
eigen3/ Linear algebra library (for cdec): http://eigen.tuxfamily.org/index.php?title=Main_Page
fast_align/ Word alignment tool : https://github.com/clab/fast_align
giza-pp/ Word alignment package (for Moses): http://www.statmt.org/moses/giza/GIZA++.html
kenlm/ Language modeling toolkit: http://kheafield.com/code/kenlm/
mgiza/ Multi-threaded Giza++ : http://www.kyloo.net/software/doku.php/mgiza:overview
mosesdecoder/ Popular SMT framework: http://www.statmt.org/moses/
multeval/ MT evaluation tool: https://github.com/jhclark/multeval
rnnlm/ Neural network language modeling toolkit: http://rnnlm.org
salm/ Suffix-array toolkit for NLP (for Moses): https://github.com/moses-smt/salm
scala/ Programming language (for cdec): http://www.scala-lang.org
vowpal_wabbit/ Machine learning toolkit compatible with Moses: http://hunch.net/~vw/
word2vec/ Continuous word representations: https://code.google.com/p/word2vec/
http://www.boxresear.ch/
Treex
http://ufal.mff.cuni.cz/treex
Papers
MT-ComparEval
Martin Popel
• Graphical evaluation interface for Machine Translation
development
• web-based tool for MT developers
• check progress of a system over time or compare several MT systems
• focus on analyzing system differences
• API for uploading translations
• Try it now - http://wmt.ufal.cz
• Install it - https://github.com/choko/MT-ComparEval/
MT-ComparEval
Martin Popel
• Online A = Bing?
• Online B = Google?
• systems = tasks
• Newest version of BLEU
• Some sentence level smoothing
Joshua 6
• (New!) Phrased-based decoder (no OSM or lexical distortion)
• (New!) Language packs
CloudLM: a Cloud-based Language Model for
Machine Translation
Evaluating MT systems with BEER
Sampling Phrase Tables for the Moses
Statistical Machine Translation System
Projects
Docker
PDF 2 Bitext
MT4NLTK
Appraise++
An open-source system for manual evaluation of MT output
It supports collaborative collection of human feedback for MT evaluation.
It implements tasks such as Translation Quality Checking, Ranking and
Error Classification, and Manual Post-Editing.
http://appraise.cf/
LM prefetch
Segmentation-Aware Language Model
Deep Machine Translation
Workshop 2015
Prague, Czech Republic
September 3-4, 2015
TectoMT Seminar 2015
Prague, Czech Republic
September 3-4, 2015
100% Acceptance Rate!
Charles University in Prague
Faculty of Mathematics and Physics
Institute of Formal and Applied Linguistics
Trdelník
Vltava
Národní technické muzeum
Thank
you!

Contenu connexe

Similaire à MTM 2015

Concurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionConcurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionCherryBerry2
 
Gestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de BarcelonaGestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de BarcelonaManuel Herranz
 
Gestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaGestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaManuel Herranz
 
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)TAUS - The Language Data Network
 
TAUS Roundtable Moscow, Translation Automation Going Cloud- The New Landscape...
TAUS Roundtable Moscow, Translation Automation Going Cloud- The New Landscape...TAUS Roundtable Moscow, Translation Automation Going Cloud- The New Landscape...
TAUS Roundtable Moscow, Translation Automation Going Cloud- The New Landscape...TAUS - The Language Data Network
 
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinarkantanmt
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsDataPhoenix
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...Lifeng (Aaron) Han
 
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyftmarkgrover
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Waykantanmt
 
New Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation TechnologyNew Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation Technologykantanmt
 
Lean Model-Driven Development through Model-Interpretation: the CPAL design ...
Lean Model-Driven Development through  Model-Interpretation: the CPAL design ...Lean Model-Driven Development through  Model-Interpretation: the CPAL design ...
Lean Model-Driven Development through Model-Interpretation: the CPAL design ...Nicolas Navet
 
What is machine translation
What is machine translationWhat is machine translation
What is machine translationStephen Peacock
 
Pangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel HerranzPangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel HerranzManuel Herranz
 
Putting Compilers to Work
Putting Compilers to WorkPutting Compilers to Work
Putting Compilers to WorkSingleStore
 
Continuous Integration In Php
Continuous Integration In PhpContinuous Integration In Php
Continuous Integration In PhpWilco Jansen
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOpsRui Quintino
 

Similaire à MTM 2015 (20)

Concurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionConcurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System Discussion
 
Gestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de BarcelonaGestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de Barcelona
 
Gestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaGestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de Barcelona
 
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
 
TAUS Roundtable Moscow, Translation Automation Going Cloud- The New Landscape...
TAUS Roundtable Moscow, Translation Automation Going Cloud- The New Landscape...TAUS Roundtable Moscow, Translation Automation Going Cloud- The New Landscape...
TAUS Roundtable Moscow, Translation Automation Going Cloud- The New Landscape...
 
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
 
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Way
 
New Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation TechnologyNew Breakthroughs in Machine Transation Technology
New Breakthroughs in Machine Transation Technology
 
Lean Model-Driven Development through Model-Interpretation: the CPAL design ...
Lean Model-Driven Development through  Model-Interpretation: the CPAL design ...Lean Model-Driven Development through  Model-Interpretation: the CPAL design ...
Lean Model-Driven Development through Model-Interpretation: the CPAL design ...
 
What is machine translation
What is machine translationWhat is machine translation
What is machine translation
 
Pangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel HerranzPangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel Herranz
 
Putting Compilers to Work
Putting Compilers to WorkPutting Compilers to Work
Putting Compilers to Work
 
CAT TOOLS.ppt
CAT TOOLS.pptCAT TOOLS.ppt
CAT TOOLS.ppt
 
Continuous Integration In Php
Continuous Integration In PhpContinuous Integration In Php
Continuous Integration In Php
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
 

Plus de Matīss ‎‎‎‎‎‎‎  

Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsMatīss ‎‎‎‎‎‎‎  
 
Effective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationEffective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationMatīss ‎‎‎‎‎‎‎  
 
Hybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsHybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsMatīss ‎‎‎‎‎‎‎  
 

Plus de Matīss ‎‎‎‎‎‎‎   (20)

日本のお風呂
日本のお風呂日本のお風呂
日本のお風呂
 
Thrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy DayThrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy Day
 
私の趣味
私の趣味私の趣味
私の趣味
 
How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?
 
私の町リガ
私の町リガ私の町リガ
私の町リガ
 
大学への交通手段
大学への交通手段大学への交通手段
大学への交通手段
 
小学生に 携帯電話
小学生に 携帯電話小学生に 携帯電話
小学生に 携帯電話
 
Tracing multisensory food experience on twitter
Tracing multisensory food experience on twitterTracing multisensory food experience on twitter
Tracing multisensory food experience on twitter
 
ラトビア大学
ラトビア大学ラトビア大学
ラトビア大学
 
私の趣味
私の趣味私の趣味
私の趣味
 
富士山りょこう
富士山りょこう富士山りょこう
富士山りょこう
 
Tips and Tools for NMT
Tips and Tools for NMTTips and Tools for NMT
Tips and Tools for NMT
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
 
The Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine TranslationThe Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine Translation
 
Advancing Estonian Machine Translation
Advancing Estonian Machine TranslationAdvancing Estonian Machine Translation
Advancing Estonian Machine Translation
 
Debugging neural machine translations
Debugging neural machine translationsDebugging neural machine translations
Debugging neural machine translations
 
Effective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationEffective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translation
 
Neirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošanaNeirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošana
 
Hybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsHybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systems
 
Paying attention to MWEs in NMT
Paying attention to MWEs in NMTPaying attention to MWEs in NMT
Paying attention to MWEs in NMT
 

Dernier

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Dernier (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

MTM 2015

  • 1. Tenth MT Marathon 2015 Prague, Czech Republic September 7-12, 2015
  • 3. MT Evaluation Yvette Graham (DCU) http://ufal.mff.cuni.cz/mtm15/files/01-mt-evaluation-yvette-graham.pdf
  • 7. Introduction to Machine Translation and Phrase-Based Machine Translation http://ufal.mff.cuni.cz/mtm15/files/04-pbmt-introduction-ales-tamchyna.pdf Aleš Tamchyna (UFAL)
  • 8. MT Talks Ondřej Bojar • http://mttalks.ufal.cz/ • Mini-lectures on MT • Coding Exercises that complement the lectures
  • 9. MT Talks Ondřej Bojar • Intro: Why is MT difficult, approaches to MT. • MT that Deceives: Serious translation errors even for short and simple inputs. • Pre-processing: Normalization and other technical tricks bound to help your MT system. • MT Evaluation in General: Techniques of judging MT quality, dimensions of translation quality, number of possible translations. • Automatic MT Evaluation: Two common automatic MT evaluation methods: PER and BLEU • Data Acquisition: The need and possible sources of training data for MT, the diminishing utility of the new data additions due to Zipf's law. • Sentence Alignment: An introduction to the Gale & Church sentence alignment algorithm. • Word Alignment: Cutting the chicken-egg problem. • Phrase-based Model: Copy if you can. • Constituency Trees: Divide and conquer. • Dependency Trees: Trees with gaps. • Rich Vocabulary: Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz. • Scoring and Optimization: Features your model features.
  • 10. Language Modelling Kenneth Heafield (University of Edinburgh) http://ufal.mff.cuni.cz/mtm15/files/09-language-modelling-kenneth-heafield.pdf LM = 50% of CPU
  • 11. Discriminative Training Miloš Stanojević (ILLC, University of Amsterdam) http://ufal.mff.cuni.cz/mtm15/files/10-discriminative-training-milos-stanojevic.pdf
  • 12. Deep Syntactic MT and TectoMT Martin Popel (UFAL) http://ufal.mff.cuni.cz/mtm15/files/12-deep-syntactic-mt-and-tectomt-martin-popel.pdf
  • 13. Deep Syntactic MT and TectoMT • 1.2s per sentence • Worst in WMT 2015 • Depfix can detect & fix negation, mostly tries to fix morphological agreement • Originally CS-EN but within QTLeap adapted to CS-EN, EN-ES, EN-NL, EN-PT, EN-EU • 67% errors from transfer, 30% from analysis Martin Popel (UFAL) http://ufal.mff.cuni.cz/mtm15/files/12-deep-syntactic-mt-and-tectomt-martin-popel.pdf
  • 14. Syntax-Based Models and Decoding http://ufal.mff.cuni.cz/mtm15/files/19a-syntax-based-models-hieu-hoang.pdf http://ufal.mff.cuni.cz/mtm15/files/19b-cyk-hieu-hoang.pdf Hieu Hoang (New York University, Abu Dhabi)
  • 16. Real-World Application of an Machine Translation Workflow • Cost is not the most important driver - it is speed / shorter turnaround time • Pricing is part of the business relationship. MT usage is only one of many driving factors.
  • 17. Real-World Application of an Machine Translation Workflow • Cost is not the most important driver - it is speed / shorter turnaround time • Pricing is part of the business relationship. MT usage is only one of many driving factors.
  • 18. Neural Network Models and Google TranslateKeith Stevens (Google) http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
  • 19. Neural Network Models and Google TranslateKeith Stevens (Google) http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
  • 20. Neural Network Models and Google TranslateKeith Stevens (Google) http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
  • 21. Neural Network Models and Google TranslateKeith Stevens (Google) http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
  • 22. Neural Network Models and Google TranslateKeith Stevens (Google) http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
  • 23. Neural Network Models and Google TranslateKeith Stevens (Google) http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
  • 24. Neural Network Models and Google TranslateKeith Stevens (Google) http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
  • 25. Neural Network Models and Google TranslateKeith Stevens (Google) http://ufal.mff.cuni.cz/mtm15/files/11-neural-network-models-and-google-translate-keith-stevens.pdf
  • 26. Text Representations for NLP and MT Hinrich Schütze • Reduce sparseness with morphological analysis for better machine translation • MarMoT - A fast and accurate morphological tagger - http://cistern.cis.lmu.de/marmot/
  • 27. Text Representations for NLP and MT Hinrich Schütze • Use embeddings for lemmata, not for word forms • Embeddings and morphological resources provide complementary information - use both!
  • 28. Text Representations for NLP and MT Hinrich Schütze
  • 29. Text Representations for NLP and MT Hinrich Schütze
  • 30. •Use lemmata for MT •Use embeddings for MT •Use linguistic morphological resources for MT •Don’t represent sentences as vectors for MT •Deep learning will not replace other MT work . . . •. . . but will be a powerful component of MT systems. Text Representations for NLP and MT Hinrich Schütze
  • 31. Labs
  • 33. eman What is eman? • A tool for managing pipelines of steps. • Purpose independent, but bundled with an ecosystem of tools for machine translation. • Written in Perl 5, runs on Linux (and probably other Unices). • Tasks submitted locally or using SGE cluster. Key Features • Create complex experiment pipelines. • Clone whole experiments or individual steps. • Re-use existing steps when possible. • Automaticaly resolve complex step dependencies. • Seamlessly share steps with others. • Generate tables of results based on customizable rules. • Easily scriptable and hacking friendly. https://ufal.mff.cuni.cz/eman/
  • 34. Box — Moses Suite on Amazon EC2 Here's what Box v2015-05-25 beta (current release) includes: cdec/ Popular SMT framework: http://www.cdec-decoder.org cmph/ Hashing library (for compact phrase tables in Moses): http://cmph.sourceforge.net ducttape/ Experiment management system (for cdec): https://github.com/jhclark/ducttape eigen3/ Linear algebra library (for cdec): http://eigen.tuxfamily.org/index.php?title=Main_Page fast_align/ Word alignment tool : https://github.com/clab/fast_align giza-pp/ Word alignment package (for Moses): http://www.statmt.org/moses/giza/GIZA++.html kenlm/ Language modeling toolkit: http://kheafield.com/code/kenlm/ mgiza/ Multi-threaded Giza++ : http://www.kyloo.net/software/doku.php/mgiza:overview mosesdecoder/ Popular SMT framework: http://www.statmt.org/moses/ multeval/ MT evaluation tool: https://github.com/jhclark/multeval rnnlm/ Neural network language modeling toolkit: http://rnnlm.org salm/ Suffix-array toolkit for NLP (for Moses): https://github.com/moses-smt/salm scala/ Programming language (for cdec): http://www.scala-lang.org vowpal_wabbit/ Machine learning toolkit compatible with Moses: http://hunch.net/~vw/ word2vec/ Continuous word representations: https://code.google.com/p/word2vec/ http://www.boxresear.ch/
  • 37. MT-ComparEval Martin Popel • Graphical evaluation interface for Machine Translation development • web-based tool for MT developers • check progress of a system over time or compare several MT systems • focus on analyzing system differences • API for uploading translations • Try it now - http://wmt.ufal.cz • Install it - https://github.com/choko/MT-ComparEval/
  • 38. MT-ComparEval Martin Popel • Online A = Bing? • Online B = Google? • systems = tasks • Newest version of BLEU • Some sentence level smoothing
  • 39.
  • 40.
  • 41.
  • 42. Joshua 6 • (New!) Phrased-based decoder (no OSM or lexical distortion) • (New!) Language packs
  • 43. CloudLM: a Cloud-based Language Model for Machine Translation
  • 45. Sampling Phrase Tables for the Moses Statistical Machine Translation System
  • 50. Appraise++ An open-source system for manual evaluation of MT output It supports collaborative collection of human feedback for MT evaluation. It implements tasks such as Translation Quality Checking, Ranking and Error Classification, and Manual Post-Editing. http://appraise.cf/
  • 53. Deep Machine Translation Workshop 2015 Prague, Czech Republic September 3-4, 2015
  • 54. TectoMT Seminar 2015 Prague, Czech Republic September 3-4, 2015 100% Acceptance Rate!
  • 55. Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics
  • 59.