SlideShare une entreprise Scribd logo
1  sur  11
A Statistical Approach to
  Machine Translation
  Peter F. Brown, John Cocke, Stephen A. Della Pietra,
  Vincent J. Della Pietra, Fredrick Jeinek, John D. Lafferty,
  Robert L. Mercer, and Paul S. Roossin

                                     Published in:
                                     · Journal
                                     Computational Linguistics archive
                                     Volume 16 Issue 2, June 1990
                                     Pages 79-85
                                     MIT Press Cambridge, MA, USA
Introduction
• (S, T): Every pair of sentences
• Pr(T|S): probability of T when presented with S
• Given a sentence T in the target language, seek
  the sentence S from which the translator
  produced T.
• Chance of error is minimized by choosing the
  most probable sentence S given T
  –   In other words, to maximize Pr(S|T)
  –   Pr(S|T) = Pr(S) Pr(T|S) / Pr(T) : Bayes’ theorem
  –   Pr(S): language model probability of S
  –   Pr(T|S): translation probability of T given S
• Statistical Translation System requires:
  – A method for computing language model
    probabilities
  – A method for computing translation probabilities
  – A method for searching among possible source
    sentences S for the one that gives the greatest
    value for Pr(S)Pr(T|S)
The Language Model
• a word string S1, S2, … Sn
• Pr(S1, S2, … Sn)
              = Pr(S1) Pr(S2|S1)…Pr(Sn|S1S2…Sn-1)
• Given a history S1S2…Sj-1, you must know the
  probability of object word Sj
• Histories are too long, probability parameters cannot
  be separated
• To reduce parameters:
   – Categorize each history into same class
   – Probability of object depends on the history in same class
The Translation Model
• A word can be translated into more than one word
   – Fertility: num. of T words that an S word produces
• Notation for alignment:
   – (Jean aime Marie | John(1) loves(2) Mary(3) )
      • ( T | S)
      • Num. in S words are positions in T
• Computing the probability of the alignment:
   – (Le chien est battu par Jean | John(6) does beat(3, 4)
     the(1) dog(2))
The Translation Model
• In English adjectives precede nouns, in French
  adj. follows nouns
  – Distortion: T word appears far from S word in
    alignment
• Distortion probability:
  – Pr(i|j, l)
     • i: a target position
     • J: a source position
     • l: the target length
Searching
• Searching for the sentence S to maximize
  Pr(S)Pr(T|S)
• Uses stack search
  – A list of partial alignment hypothesis
  – (Jean aime Marie| * )
     • *: a place holder for unknown sequence of S
  – In iteration, extends most promising entries and adds
    to its hypothesis
  – Ends when complete alignment is the most promising
    significantly
Parameter Estimation
• Both LM and TM have many parameters
• To estimate, needs pairs of translations
• For this experiment, they used Canadian
  parliament’s records translated in
  English/French
  – Param. & LM/TM can be estimated from this
Two Pilot Experiments
• First experiment: to estimate params for TM
• 9000 most common words in English/French
  from Hansard data
Two Pilot Experiment
• Second experiment:
   – French to English
   – 1,000 most frequently used English words
   – 1,700 most frequently used French words ( covered by the 1k
     English words)
   – Estimated 17 million parameters of translation model from
     117,000 pairs of sentences
   – Est. bigram language model from 570,000 sentences from the
     English part of Hansard
   – Evaluation:
      •   Exact: Decoded sentence was exactly the same
      •   Alternate: same meaning, slightly different words
      •   Different: not covey the same meaning as the translation
      •   Wrong: makes a sense but not interpreted
      •   Ungrammatical: grammatically deficient
Translation Example/Result

Contenu connexe

En vedette

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...TAUS - The Language Data Network
 
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engineTAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engineTAUS - The Language Data Network
 
Machine Translation: What it is?
Machine Translation: What it is?Machine Translation: What it is?
Machine Translation: What it is?Multilizer
 
Google services
Google servicesGoogle services
Google servicestahamj1987
 
Designing e-Learning Content for Localization
Designing e-Learning Content for LocalizationDesigning e-Learning Content for Localization
Designing e-Learning Content for LocalizationSumaLatam
 
Sec16.3: Reordering Integration
Sec16.3: Reordering IntegrationSec16.3: Reordering Integration
Sec16.3: Reordering Integrationvarbalow
 
Assamese to English Statistical Machine Translation
Assamese to English Statistical Machine TranslationAssamese to English Statistical Machine Translation
Assamese to English Statistical Machine TranslationKalyanee Baruah
 
Data Localization and Translation
Data Localization and TranslationData Localization and Translation
Data Localization and TranslationYevhen Shyshkin
 
Going Global? The ABC of Localization-Friendly Content
Going Global? The ABC of Localization-Friendly ContentGoing Global? The ABC of Localization-Friendly Content
Going Global? The ABC of Localization-Friendly ContentSumaLatam
 
Translation & Localization
Translation & LocalizationTranslation & Localization
Translation & LocalizationVengaGlobal
 
Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...SDL
 
TAUS webinar The Big Picture View On The Translation Industry, March 2013
TAUS webinar The Big Picture View On The Translation Industry, March 2013TAUS webinar The Big Picture View On The Translation Industry, March 2013
TAUS webinar The Big Picture View On The Translation Industry, March 2013TAUS - The Language Data Network
 
Localization and globalization in c#
Localization and globalization in c#Localization and globalization in c#
Localization and globalization in c#PaYal Umraliya
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionKris Jack
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copyNakul Sharma
 
C#: Globalization and localization
C#: Globalization and localizationC#: Globalization and localization
C#: Globalization and localizationRohit Vipin Mathews
 

En vedette (20)

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...
 
WEBINAR: TAUS Outlook 2013
WEBINAR: TAUS Outlook 2013WEBINAR: TAUS Outlook 2013
WEBINAR: TAUS Outlook 2013
 
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engineTAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
 
Machine Translation: What it is?
Machine Translation: What it is?Machine Translation: What it is?
Machine Translation: What it is?
 
Google services
Google servicesGoogle services
Google services
 
Designing e-Learning Content for Localization
Designing e-Learning Content for LocalizationDesigning e-Learning Content for Localization
Designing e-Learning Content for Localization
 
Sec16.3: Reordering Integration
Sec16.3: Reordering IntegrationSec16.3: Reordering Integration
Sec16.3: Reordering Integration
 
Escaping style and script data
Escaping style and script dataEscaping style and script data
Escaping style and script data
 
Translationusing moses1
Translationusing moses1Translationusing moses1
Translationusing moses1
 
Assamese to English Statistical Machine Translation
Assamese to English Statistical Machine TranslationAssamese to English Statistical Machine Translation
Assamese to English Statistical Machine Translation
 
Data Localization and Translation
Data Localization and TranslationData Localization and Translation
Data Localization and Translation
 
Going Global? The ABC of Localization-Friendly Content
Going Global? The ABC of Localization-Friendly ContentGoing Global? The ABC of Localization-Friendly Content
Going Global? The ABC of Localization-Friendly Content
 
Translation & Localization
Translation & LocalizationTranslation & Localization
Translation & Localization
 
Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...
 
TAUS webinar The Big Picture View On The Translation Industry, March 2013
TAUS webinar The Big Picture View On The Translation Industry, March 2013TAUS webinar The Big Picture View On The Translation Industry, March 2013
TAUS webinar The Big Picture View On The Translation Industry, March 2013
 
Localization and globalization in c#
Localization and globalization in c#Localization and globalization in c#
Localization and globalization in c#
 
Localization framework
Localization frameworkLocalization framework
Localization framework
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language Acquisition
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copy
 
C#: Globalization and localization
C#: Globalization and localizationC#: Globalization and localization
C#: Globalization and localization
 

Similaire à A statistical approach to machine translation

Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translationHrishikesh Nair
 
Word level language identification in code-switched texts
Word level language identification in code-switched textsWord level language identification in code-switched texts
Word level language identification in code-switched textsHarsh Jhamtani
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...Lifeng (Aaron) Han
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
OUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language ModelingOUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language ModelingFlorian Leitner
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology miningEstelle Delpech
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingSean Golliher
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4DigiGurukul
 
textprocessingboth.pptx
textprocessingboth.pptxtextprocessingboth.pptx
textprocessingboth.pptxbdiot
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
 
Speech To Sign Language Interpreter System
Speech To Sign Language Interpreter SystemSpeech To Sign Language Interpreter System
Speech To Sign Language Interpreter Systemkkkseld
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systemspaperpublications3
 

Similaire à A statistical approach to machine translation (20)

Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
Word level language identification in code-switched texts
Word level language identification in code-switched textsWord level language identification in code-switched texts
Word level language identification in code-switched texts
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
OUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language ModelingOUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language Modeling
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology mining
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document Parsing
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
haenelt.ppt
haenelt.ppthaenelt.ppt
haenelt.ppt
 
More than Words: Advancing Prosodic Analysis
More than Words: Advancing Prosodic AnalysisMore than Words: Advancing Prosodic Analysis
More than Words: Advancing Prosodic Analysis
 
Syntax
SyntaxSyntax
Syntax
 
textprocessingboth.pptx
textprocessingboth.pptxtextprocessingboth.pptx
textprocessingboth.pptx
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
Speech To Sign Language Interpreter System
Speech To Sign Language Interpreter SystemSpeech To Sign Language Interpreter System
Speech To Sign Language Interpreter System
 
Nlp
NlpNlp
Nlp
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
NLP and Deep Learning
NLP and Deep LearningNLP and Deep Learning
NLP and Deep Learning
 
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech SystemsPhonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systems
 

Plus de Hiroshi Matsumoto

Phrase linguistic classification and generalization for improving statistical...
Phrase linguistic classification and generalization for improving statistical...Phrase linguistic classification and generalization for improving statistical...
Phrase linguistic classification and generalization for improving statistical...Hiroshi Matsumoto
 
Paraphrasing Swedish Compound Nouns in Machine Translation
Paraphrasing Swedish Compound Nouns in Machine TranslationParaphrasing Swedish Compound Nouns in Machine Translation
Paraphrasing Swedish Compound Nouns in Machine TranslationHiroshi Matsumoto
 
Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...
Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...
Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...Hiroshi Matsumoto
 
Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...
Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...
Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...Hiroshi Matsumoto
 
Improving translation via targeted paraphrasing
Improving translation via targeted paraphrasingImproving translation via targeted paraphrasing
Improving translation via targeted paraphrasingHiroshi Matsumoto
 
Summary: A Sense-Based Translation Model for Statistical Machine Translation
Summary: A Sense-Based Translation Model for Statistical Machine TranslationSummary: A Sense-Based Translation Model for Statistical Machine Translation
Summary: A Sense-Based Translation Model for Statistical Machine TranslationHiroshi Matsumoto
 
Predicting Power Relations between Participants in Written Dialog from a Sing...
Predicting Power Relations between Participants in Written Dialog from a Sing...Predicting Power Relations between Participants in Written Dialog from a Sing...
Predicting Power Relations between Participants in Written Dialog from a Sing...Hiroshi Matsumoto
 
10.combination of sm_tn_rbmt
10.combination of sm_tn_rbmt10.combination of sm_tn_rbmt
10.combination of sm_tn_rbmtHiroshi Matsumoto
 
9. cgc parser with_norml_std
9. cgc parser with_norml_std9. cgc parser with_norml_std
9. cgc parser with_norml_stdHiroshi Matsumoto
 
Summary of English Japanese Translation by MSR-MT
Summary of English Japanese Translation by MSR-MTSummary of English Japanese Translation by MSR-MT
Summary of English Japanese Translation by MSR-MTHiroshi Matsumoto
 
Approach to japanese english automatic translation by Susumu Kuno
Approach to japanese english automatic translation by Susumu KunoApproach to japanese english automatic translation by Susumu Kuno
Approach to japanese english automatic translation by Susumu KunoHiroshi Matsumoto
 

Plus de Hiroshi Matsumoto (17)

Phrase linguistic classification and generalization for improving statistical...
Phrase linguistic classification and generalization for improving statistical...Phrase linguistic classification and generalization for improving statistical...
Phrase linguistic classification and generalization for improving statistical...
 
Paraphrasing Swedish Compound Nouns in Machine Translation
Paraphrasing Swedish Compound Nouns in Machine TranslationParaphrasing Swedish Compound Nouns in Machine Translation
Paraphrasing Swedish Compound Nouns in Machine Translation
 
Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...
Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...
Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...
 
Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...
Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...
Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...
 
Improving translation via targeted paraphrasing
Improving translation via targeted paraphrasingImproving translation via targeted paraphrasing
Improving translation via targeted paraphrasing
 
Summary: A Sense-Based Translation Model for Statistical Machine Translation
Summary: A Sense-Based Translation Model for Statistical Machine TranslationSummary: A Sense-Based Translation Model for Statistical Machine Translation
Summary: A Sense-Based Translation Model for Statistical Machine Translation
 
Predicting Power Relations between Participants in Written Dialog from a Sing...
Predicting Power Relations between Participants in Written Dialog from a Sing...Predicting Power Relations between Participants in Written Dialog from a Sing...
Predicting Power Relations between Participants in Written Dialog from a Sing...
 
Modeling Irony in Twitter
Modeling Irony in TwitterModeling Irony in Twitter
Modeling Irony in Twitter
 
Factored translationmodel
Factored translationmodelFactored translationmodel
Factored translationmodel
 
10.combination of sm_tn_rbmt
10.combination of sm_tn_rbmt10.combination of sm_tn_rbmt
10.combination of sm_tn_rbmt
 
9. cgc parser with_norml_std
9. cgc parser with_norml_std9. cgc parser with_norml_std
9. cgc parser with_norml_std
 
8. relearnt rbmt
8. relearnt rbmt8. relearnt rbmt
8. relearnt rbmt
 
Summary of English Japanese Translation by MSR-MT
Summary of English Japanese Translation by MSR-MTSummary of English Japanese Translation by MSR-MT
Summary of English Japanese Translation by MSR-MT
 
5. bleu
5. bleu5. bleu
5. bleu
 
Mt framework nagao_makoto
Mt framework nagao_makotoMt framework nagao_makoto
Mt framework nagao_makoto
 
Approach to japanese english automatic translation by Susumu Kuno
Approach to japanese english automatic translation by Susumu KunoApproach to japanese english automatic translation by Susumu Kuno
Approach to japanese english automatic translation by Susumu Kuno
 
Machine translation
Machine translationMachine translation
Machine translation
 

Dernier

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Dernier (20)

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

A statistical approach to machine translation

  • 1. A Statistical Approach to Machine Translation Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Fredrick Jeinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin Published in: · Journal Computational Linguistics archive Volume 16 Issue 2, June 1990 Pages 79-85 MIT Press Cambridge, MA, USA
  • 2. Introduction • (S, T): Every pair of sentences • Pr(T|S): probability of T when presented with S • Given a sentence T in the target language, seek the sentence S from which the translator produced T. • Chance of error is minimized by choosing the most probable sentence S given T – In other words, to maximize Pr(S|T) – Pr(S|T) = Pr(S) Pr(T|S) / Pr(T) : Bayes’ theorem – Pr(S): language model probability of S – Pr(T|S): translation probability of T given S
  • 3. • Statistical Translation System requires: – A method for computing language model probabilities – A method for computing translation probabilities – A method for searching among possible source sentences S for the one that gives the greatest value for Pr(S)Pr(T|S)
  • 4. The Language Model • a word string S1, S2, … Sn • Pr(S1, S2, … Sn) = Pr(S1) Pr(S2|S1)…Pr(Sn|S1S2…Sn-1) • Given a history S1S2…Sj-1, you must know the probability of object word Sj • Histories are too long, probability parameters cannot be separated • To reduce parameters: – Categorize each history into same class – Probability of object depends on the history in same class
  • 5. The Translation Model • A word can be translated into more than one word – Fertility: num. of T words that an S word produces • Notation for alignment: – (Jean aime Marie | John(1) loves(2) Mary(3) ) • ( T | S) • Num. in S words are positions in T • Computing the probability of the alignment: – (Le chien est battu par Jean | John(6) does beat(3, 4) the(1) dog(2))
  • 6. The Translation Model • In English adjectives precede nouns, in French adj. follows nouns – Distortion: T word appears far from S word in alignment • Distortion probability: – Pr(i|j, l) • i: a target position • J: a source position • l: the target length
  • 7. Searching • Searching for the sentence S to maximize Pr(S)Pr(T|S) • Uses stack search – A list of partial alignment hypothesis – (Jean aime Marie| * ) • *: a place holder for unknown sequence of S – In iteration, extends most promising entries and adds to its hypothesis – Ends when complete alignment is the most promising significantly
  • 8. Parameter Estimation • Both LM and TM have many parameters • To estimate, needs pairs of translations • For this experiment, they used Canadian parliament’s records translated in English/French – Param. & LM/TM can be estimated from this
  • 9. Two Pilot Experiments • First experiment: to estimate params for TM • 9000 most common words in English/French from Hansard data
  • 10. Two Pilot Experiment • Second experiment: – French to English – 1,000 most frequently used English words – 1,700 most frequently used French words ( covered by the 1k English words) – Estimated 17 million parameters of translation model from 117,000 pairs of sentences – Est. bigram language model from 570,000 sentences from the English part of Hansard – Evaluation: • Exact: Decoded sentence was exactly the same • Alternate: same meaning, slightly different words • Different: not covey the same meaning as the translation • Wrong: makes a sense but not interpreted • Ungrammatical: grammatically deficient