SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Institut für Anthropomatik1 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Grammatical Agreement in SMT
Seminar Sprach-zu-Sprach-Übersetzung
SS 2013
Institut für Anthropomatik2 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Inflection
– Modification of a word
– signals grammatical variants (tense, gender, case, …)
– e.g. walk vs. Walked
Agreement
– Inflection for related words in a sentence has to agree
– e.g. das Haus vs. die Haus
Some languages are weakly inflected (e.g. English)
Some are highly inflected (e.g. German, Arabic, …)
Inflection and Agreement
Institut für Anthropomatik3 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Local Agreement Errors
Ref:
the-carF
goF
with-speed
Hypo:
the-carF
goM
with-speed
Long-distance Agreement Errors
Ref: celle qui parle , c’est ma femme
oneF
who speak , is my wifeF
Hypo: celui qui parle est ma femme
oneM
who speak is my spouseF
Agreement Errors
Institut für Anthropomatik4 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Approaches for SMT
Morphological Generation
– Create raw stems and modify with predicted inflection
Agreement Constraints
– Use SCFG of target and add constraints to it
Class-based Agreement Model
– Use morphological word classes “Noun+Def+Sg+Fem”
Institut für Anthropomatik5 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Morphological Generation: Idea
“Generating Complex Morphology for Machine Translation” (Minkov
and Toutanova, 2007)
Convert MT output to stem sequence
Predict an inflection for every stem
Reflect meaning and comply with agreement rules
Institut für Anthropomatik6 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Morphological Generation: Lexicons
Morphology analysis and generation
Operations:
– Stemming
– Inflection
– Morphological analysis
Create manually
Create automatically from data
Here: assumed as given
Institut für Anthropomatik7 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Morphological Generation: Inflection Prediction
Maximum Entropy Markov model (2nd
order)
Features:
– Monolingual
– Bilingual
– Lexical
– Morphological
– Syntactic
p(̄y∣̄x)=∏t=1
n
p(yt∣ yt−1 , yt−2 , xt ) , yt ∈It
Institut für Anthropomatik8 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Morphological Generation: Evaluation
English-Russian and English-Arabic
Technical (software manual) domain
Input: Aligned sentence pairs of reference translations (no output of MT
System) → reduce noise
Accuracy (%) results
Institut für Anthropomatik9 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Morphological Generation: Conclusion
Needed resources:
– Large corpus of aligned sentence pairs
– Lexicons (source and target) with the three operations
+ Better accuracy than simple LM (even with small training data)
+ Easy to add to existing MT system
- Expensive creation of lexicons
Institut für Anthropomatik10 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Constraints: Idea
“Agreement Constraints for Statistical Machine Translation into
German” (Williams and Koehn, 2011)
String-to-tree model
Synchronous grammar for target language
Adding learned constraints and probabilities
Evaluation of constraints during decoding
Institut für Anthropomatik11 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Constraints: Feature Structure
Feature structure
Unification
Institut für Anthropomatik12 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Constraints: Grammar
Synchronous grammar learned from parallel corpus
Extended by constraints at target-side
Sample rule/constraint:
NP-SB → the X1
cat | die AP1
Katze
Institut für Anthropomatik13 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Constraints: Training
Propagation rules to
capture NP/PP agreements:
Applied bottom-up
Institut für Anthropomatik14 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Constraints: Decoding
Model:
Every element of rule/constraint has a feature structure
Constraint evaluation: Each hypothesis stores set of feature structures
corresponding to its root rule element
Recombination of hypotheses is possible
̂t=arg max
t
p(t∣s)
p(t∣s)=
1
Z
∑
i=1
n
λi hi (s ,t)
Institut für Anthropomatik15 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Constraints: Evaluation
English-German
Europarl and News Commentary
Parsing: BitPar; Alignment: GIZA++; SCFG rules: Moses toolkit
Treebank for target
Grammar: ~140 m rules
BLEU scores and p-values for three test sets
Institut für Anthropomatik16 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Constraints: Conclusion
Needed resources:
– Parallel corpus
– Heuristics for constraint extraction
+ Improvement in translation accuracy
- Improvement is quite small
Institut für Anthropomatik17 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Class-Based: Idea
1. Segmentation
2. Tagging
3. Scoring
“A Class-Based Agreement Model for Generating Accurately Inflected
Translations” (Green and DeNero, 2012)
During Decoding
Target-Side
Three Steps:
Institut für Anthropomatik18 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Class-Based: Segmentation
Train conditional random field
Features:
Centered 5-character window
During decoding
Not as preprocessing step
Labels:
I: Continuation (Inside)
O: Outside (whitespace)
B: Beginning
F: Non-native chars
Institut für Anthropomatik19 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Class-Based: Tagging
Train CRF on full sentences with gold classes
Features:
– Current and previous words, affixes, etc.
Labels:
– Morphological classes
→ Gender, number, person, definiteness
– e.g. 89 classes for Arabic
Example:
'the car'
Tagged: “Noun+Def+Sg+Fem”
Institut für Anthropomatik20 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Class-Based: Scoring
Scoring of word sequences not comparable across hypotheses
→ Scoring class sequences with generative model
Simple bigram LM over gold class sequences (add-1 smoothed)
τ' =arg max
τ
p(τ∣̂s)
q(e)= p(τ')=∏i=1
I
p(τ'i∣τ'i−1)
Institut für Anthropomatik21 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Class-Based: Evaluation
English-Arabic
Training data: variety of sources (e.g. web)
Development and Test: NIST sets (Newswire and mixed genre
[broadcast news, newsgroups, weblog])
Phrase-based decoder
BLEU score for newswire sets
BLEU score for mixed genre sets
Institut für Anthropomatik22 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Class-Based: Conclusion
Needed resources:
– Treebank for target (existing for many languages)
– Large target corpus
+ Improves translation quality
+ Easy to integrate in existing MT system
- Increases decoding time
- Not very good for mixed genres
Institut für Anthropomatik23 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel
Green, S. and DeNero, J. (2012). “A Class-Based Agreement Model for
Generating Accurately Inflected Translations”. In: ACL.
Williams, P. and Koehn, P. (2011). “Agreement Constraints for Statistical
Machine Translation into German”. In: Sixth Workshop on Statistical
Machine Translation
Minkov, E. and Toutanova, K. (2007) “Generating Complex Morphology
for Machine Translation”. In: ACL.
References

Contenu connexe

En vedette

Understanding the errors of arabic speaking ell’s
Understanding the errors of arabic speaking ell’sUnderstanding the errors of arabic speaking ell’s
Understanding the errors of arabic speaking ell’sanbray723
 
Translation Problems with 4 Different Languages
Translation Problems with 4 Different LanguagesTranslation Problems with 4 Different Languages
Translation Problems with 4 Different LanguagesTennycut
 
Google translator
Google translatorGoogle translator
Google translatorLaura P
 
Arabic to-english machine translation
Arabic to-english machine translationArabic to-english machine translation
Arabic to-english machine translationArabic_NLP_ImamU2013
 
Translation problems
Translation problemsTranslation problems
Translation problemsCharley_Long
 
Introduction to Translation
Introduction to TranslationIntroduction to Translation
Introduction to TranslationMohammed Raiyah
 
Grammatical problems in translation
Grammatical problems in translationGrammatical problems in translation
Grammatical problems in translationAcademic Supervisor
 
Challenges of Translation
Challenges of TranslationChallenges of Translation
Challenges of Translationm nagaRAJU
 
Translation Strategies, by Dr. Shadia Y. Banjar
Translation Strategies, by Dr. Shadia Y. BanjarTranslation Strategies, by Dr. Shadia Y. Banjar
Translation Strategies, by Dr. Shadia Y. BanjarDr. Shadia Banjar
 
Translation techniques presentation
Translation  techniques  presentationTranslation  techniques  presentation
Translation techniques presentationAngelo pizzuto
 
Translation Types
Translation TypesTranslation Types
Translation TypesElena Shapa
 
Intercultural Communications Chapter 5: Language
Intercultural Communications Chapter 5: LanguageIntercultural Communications Chapter 5: Language
Intercultural Communications Chapter 5: LanguageSawyer Education & Training
 
Translation: purpose in practice
Translation: purpose in practiceTranslation: purpose in practice
Translation: purpose in practiceNicola Thayil
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedSlideShare
 

En vedette (16)

Understanding the errors of arabic speaking ell’s
Understanding the errors of arabic speaking ell’sUnderstanding the errors of arabic speaking ell’s
Understanding the errors of arabic speaking ell’s
 
Translation Problems with 4 Different Languages
Translation Problems with 4 Different LanguagesTranslation Problems with 4 Different Languages
Translation Problems with 4 Different Languages
 
Google translator
Google translatorGoogle translator
Google translator
 
Arabic to-english machine translation
Arabic to-english machine translationArabic to-english machine translation
Arabic to-english machine translation
 
Translation problems
Translation problemsTranslation problems
Translation problems
 
Translation strategy
Translation strategyTranslation strategy
Translation strategy
 
Introduction to Translation
Introduction to TranslationIntroduction to Translation
Introduction to Translation
 
Grammatical problems in translation
Grammatical problems in translationGrammatical problems in translation
Grammatical problems in translation
 
Challenges of Translation
Challenges of TranslationChallenges of Translation
Challenges of Translation
 
Translation Strategies, by Dr. Shadia Y. Banjar
Translation Strategies, by Dr. Shadia Y. BanjarTranslation Strategies, by Dr. Shadia Y. Banjar
Translation Strategies, by Dr. Shadia Y. Banjar
 
Methods Of Translation
Methods Of TranslationMethods Of Translation
Methods Of Translation
 
Translation techniques presentation
Translation  techniques  presentationTranslation  techniques  presentation
Translation techniques presentation
 
Translation Types
Translation TypesTranslation Types
Translation Types
 
Intercultural Communications Chapter 5: Language
Intercultural Communications Chapter 5: LanguageIntercultural Communications Chapter 5: Language
Intercultural Communications Chapter 5: Language
 
Translation: purpose in practice
Translation: purpose in practiceTranslation: purpose in practice
Translation: purpose in practice
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-Presented
 

Dernier

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Dernier (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Grammatical Agreement in SMT

  • 1. Institut für Anthropomatik1 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Grammatical Agreement in SMT Seminar Sprach-zu-Sprach-Übersetzung SS 2013
  • 2. Institut für Anthropomatik2 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Inflection – Modification of a word – signals grammatical variants (tense, gender, case, …) – e.g. walk vs. Walked Agreement – Inflection for related words in a sentence has to agree – e.g. das Haus vs. die Haus Some languages are weakly inflected (e.g. English) Some are highly inflected (e.g. German, Arabic, …) Inflection and Agreement
  • 3. Institut für Anthropomatik3 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Local Agreement Errors Ref: the-carF goF with-speed Hypo: the-carF goM with-speed Long-distance Agreement Errors Ref: celle qui parle , c’est ma femme oneF who speak , is my wifeF Hypo: celui qui parle est ma femme oneM who speak is my spouseF Agreement Errors
  • 4. Institut für Anthropomatik4 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Approaches for SMT Morphological Generation – Create raw stems and modify with predicted inflection Agreement Constraints – Use SCFG of target and add constraints to it Class-based Agreement Model – Use morphological word classes “Noun+Def+Sg+Fem”
  • 5. Institut für Anthropomatik5 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Morphological Generation: Idea “Generating Complex Morphology for Machine Translation” (Minkov and Toutanova, 2007) Convert MT output to stem sequence Predict an inflection for every stem Reflect meaning and comply with agreement rules
  • 6. Institut für Anthropomatik6 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Morphological Generation: Lexicons Morphology analysis and generation Operations: – Stemming – Inflection – Morphological analysis Create manually Create automatically from data Here: assumed as given
  • 7. Institut für Anthropomatik7 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Morphological Generation: Inflection Prediction Maximum Entropy Markov model (2nd order) Features: – Monolingual – Bilingual – Lexical – Morphological – Syntactic p(̄y∣̄x)=∏t=1 n p(yt∣ yt−1 , yt−2 , xt ) , yt ∈It
  • 8. Institut für Anthropomatik8 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Morphological Generation: Evaluation English-Russian and English-Arabic Technical (software manual) domain Input: Aligned sentence pairs of reference translations (no output of MT System) → reduce noise Accuracy (%) results
  • 9. Institut für Anthropomatik9 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Morphological Generation: Conclusion Needed resources: – Large corpus of aligned sentence pairs – Lexicons (source and target) with the three operations + Better accuracy than simple LM (even with small training data) + Easy to add to existing MT system - Expensive creation of lexicons
  • 10. Institut für Anthropomatik10 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Constraints: Idea “Agreement Constraints for Statistical Machine Translation into German” (Williams and Koehn, 2011) String-to-tree model Synchronous grammar for target language Adding learned constraints and probabilities Evaluation of constraints during decoding
  • 11. Institut für Anthropomatik11 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Constraints: Feature Structure Feature structure Unification
  • 12. Institut für Anthropomatik12 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Constraints: Grammar Synchronous grammar learned from parallel corpus Extended by constraints at target-side Sample rule/constraint: NP-SB → the X1 cat | die AP1 Katze
  • 13. Institut für Anthropomatik13 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Constraints: Training Propagation rules to capture NP/PP agreements: Applied bottom-up
  • 14. Institut für Anthropomatik14 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Constraints: Decoding Model: Every element of rule/constraint has a feature structure Constraint evaluation: Each hypothesis stores set of feature structures corresponding to its root rule element Recombination of hypotheses is possible ̂t=arg max t p(t∣s) p(t∣s)= 1 Z ∑ i=1 n λi hi (s ,t)
  • 15. Institut für Anthropomatik15 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Constraints: Evaluation English-German Europarl and News Commentary Parsing: BitPar; Alignment: GIZA++; SCFG rules: Moses toolkit Treebank for target Grammar: ~140 m rules BLEU scores and p-values for three test sets
  • 16. Institut für Anthropomatik16 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Constraints: Conclusion Needed resources: – Parallel corpus – Heuristics for constraint extraction + Improvement in translation accuracy - Improvement is quite small
  • 17. Institut für Anthropomatik17 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Class-Based: Idea 1. Segmentation 2. Tagging 3. Scoring “A Class-Based Agreement Model for Generating Accurately Inflected Translations” (Green and DeNero, 2012) During Decoding Target-Side Three Steps:
  • 18. Institut für Anthropomatik18 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Class-Based: Segmentation Train conditional random field Features: Centered 5-character window During decoding Not as preprocessing step Labels: I: Continuation (Inside) O: Outside (whitespace) B: Beginning F: Non-native chars
  • 19. Institut für Anthropomatik19 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Class-Based: Tagging Train CRF on full sentences with gold classes Features: – Current and previous words, affixes, etc. Labels: – Morphological classes → Gender, number, person, definiteness – e.g. 89 classes for Arabic Example: 'the car' Tagged: “Noun+Def+Sg+Fem”
  • 20. Institut für Anthropomatik20 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Class-Based: Scoring Scoring of word sequences not comparable across hypotheses → Scoring class sequences with generative model Simple bigram LM over gold class sequences (add-1 smoothed) τ' =arg max τ p(τ∣̂s) q(e)= p(τ')=∏i=1 I p(τ'i∣τ'i−1)
  • 21. Institut für Anthropomatik21 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Class-Based: Evaluation English-Arabic Training data: variety of sources (e.g. web) Development and Test: NIST sets (Newswire and mixed genre [broadcast news, newsgroups, weblog]) Phrase-based decoder BLEU score for newswire sets BLEU score for mixed genre sets
  • 22. Institut für Anthropomatik22 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Class-Based: Conclusion Needed resources: – Treebank for target (existing for many languages) – Large target corpus + Improves translation quality + Easy to integrate in existing MT system - Increases decoding time - Not very good for mixed genres
  • 23. Institut für Anthropomatik23 24.06.13 Simon Hummel – Lehrstuhl Prof. Waibel Green, S. and DeNero, J. (2012). “A Class-Based Agreement Model for Generating Accurately Inflected Translations”. In: ACL. Williams, P. and Koehn, P. (2011). “Agreement Constraints for Statistical Machine Translation into German”. In: Sixth Workshop on Statistical Machine Translation Minkov, E. and Toutanova, K. (2007) “Generating Complex Morphology for Machine Translation”. In: ACL. References