SlideShare a Scribd company logo
1 of 17
Download to read offline
Information Retrieval as
Statistical Translation
ADAM BERGER & JOHN LAFFERTY 1999
Bhavesh Singh
2010cs50281
OUTLINE
•
•
•
•
•
•
•
•

INTRODUCTION
MODEL OF QUERY GENERATION
PREVIOUS WORK USING 2-POISSON MODEL
STATISTICAL TRANSLATION
MODELS OF DOCUMENT-QUERY TRANSLATION
WORD-FOR-WORD TRANSLATION
EXPERIMENTAL RESULTS
CRITIQUE
INTRODUCTION
• Information Retrieval (IR): Obtaining information resources relevant to an information need from a
collection of information resources (documents).
• Predicting relevance is the central goal of IR.
• A new probabilistic approach to IR based upon the ideas and methods of statistical machine translation.
• Model: Medium between data and understanding.
• Ultimately, document retrieval systems must be sophisticated enough to handle polysemy and
synonymy.
INTRODUCTION (…cont.)
SOME BASIC TERMINOLOGIES
PRECISION is the fraction of the documents retrieved that are relevant to the user's information
need.

RECALL is the fraction of the documents that are relevant to the query that are successfully
retrieved.

There is a inverse relation between precision and recall.
MODEL OF QUERY GENERATION
• The user ‘U’ has an information need ‘I’ .

• From this need, he generates an ideal document ‘d’.
• Ideal Document: a perfect fit for the user, but almost certainly not present in the retrieval system’s
collection of documents.
• He selects a set of key terms from ‘d’, and generates a query ‘q’ from this set.

In this setting, the task of a retrieval system is to find those documents most similar to ‘d’.
The Retrieval System’s task
To find the most likely documents given the query; that is, those ‘d’ for which p(d | q, U) is
highest. By Bayes’ law –

Denominator p(q | U) is fixed for a given query and user, we can ignore it for the purpose of
ranking documents, and define the relevance of a document to a query as –
2-POISSON MODEL (PREVIOUS WORK)
The 2-Poisson model is a mixture, that is a linear combination, of two Poisson distributions:

Where Et – the Elite set of term t which occur more densely and non randomly in a few documents.
In the context of IR, the 2-Poisson is used to model the probability distribution of the frequency X of a term
in a collection of documents.
The effectiveness of the Two-Poisson model for document retrieval was never tested, for two reasons. The
first issue is that the learning of the three parameters using the Expectation Maximization (EM) algorithm
for each term is expensive, and in general large collections contain millions of terms. The second problem is
that the model does not take into account the document size, therefore the model should be extended to
normalize different document lengths.
STATISTICAL MACHINE TRANSLATION
Automatic translation by computer was first contemplated by Warren Weaver when modern
computers were in their infancy.
The central problem of statistical MT is to build a system that automatically learns how to
translate text, by inspecting a large set of sentences in one language along with their
translations into another language.
Let translational probability for each English word ‘e’ translating to each French word ‘f’ is given
by : t( f | e).
STATISTICAL MT (..cont.)
The probability that an English sentence e = {e1, e2,…} translates to a French sentence f =
{f1,f2,…} is calculated as

where Gamma is a normalizing factor. The hidden variable in this model is the alignment a
between the French and English words: aj = k means that the kth English word translates to the
jth French word.
MODEL OF DOCUMENT-QUERY
TRANSLATION
First, a word ‘w’ is chosen at random from the document d according to distribution l( w | d)
that we call the document language model.
Next translate ‘w’ into the word or phrase ‘q’ according to a translational model, with
parameters t( q | w).
Thus, the probability of choosing q as a representative of the document d is –

Now assuming the sample size model ᶲ( n | d) as the Poisson distribution with mean lamda(d)
as -
MODEL OF DOCUMENT-QUERY
TRANSLATION (…cont.)
Under that assumption of treating the number of samples ‘n’ as Poisson distribution, the
probability that a particular query q = q1,q2,…qm is generated will be given by –

This was the Model 1 of document-query translation. It was inspired by IBM statistical
translation model.
To fit translational probabilities in Model 1, Expectation Maximization (EM) algorithm is used.
MODEL 0 – THE SIMPLEST CASE: WORDFOR-WORD TRANSLATION
The simplest version of the model 1 which we will distinguish as Model 0 is one where each
word ‘w’ can be translated only as itself; that is, the translation probabilities are “diagonal”:

Under this model, the query terms are chosen simply according to their frequency of occurrence
in the document.
The probability for query in this case is simply -
EXPERIMENTAL RESULTS

Precision-Recall plots. The left plot compares Model 1 to Model 0 on the SDR data. The right
plot compares the same language model scored according to Model 0, demonstrating that the
approximations are very good.
CRITIQUE
The 2-Poisson Model was never tested due to one of the reason that the learning of three
parameters for each term is expensive because the Expectation Maximization algorithm
converges in several iterations.
According to this paper, to fit the translation probabilities of Model 1, EM algorithm is used. So
this is also an expensive operation. The efficiency of EM in Model 1 is not discussed well. It
should be more elaborated.
REFERENCES
[1] “Information Retrieval as Statistical Translation” by Adam Berger and John Lafferty, 1999.

[2] “Two Poisson model” by Giambattista Amati, Fondazione ugo Bordoni.
[3] Information Retrieval as Statistical Translation by Robert Barbey.
[4] Wikipedia article on “Information Retrieval”.
THANK YOU

More Related Content

What's hot

Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval modelbaradhimarch81
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memoriesRIILP
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine TranslationRIILP
 
An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationCSCJournals
 
Improving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior KnowledgeImproving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior KnowledgeGaetano Rossiello, PhD
 
Cohesive Software Design
Cohesive Software DesignCohesive Software Design
Cohesive Software Designijtsrd
 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsEditor IJCATR
 
Search Engines
Search EnginesSearch Engines
Search Enginesbutest
 
Proposed Method for String Transformation using Probablistic Approach
Proposed Method for String Transformation using Probablistic ApproachProposed Method for String Transformation using Probablistic Approach
Proposed Method for String Transformation using Probablistic ApproachEditor IJMTER
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsShubhangi Tandon
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Primya Tamil
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 

What's hot (20)

A survey of xml tree patterns
A survey of xml tree patternsA survey of xml tree patterns
A survey of xml tree patterns
 
Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval model
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
 
Ir 09
Ir   09Ir   09
Ir 09
 
Svv
SvvSvv
Svv
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
 
An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif Identification
 
Improving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior KnowledgeImproving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior Knowledge
 
Cohesive Software Design
Cohesive Software DesignCohesive Software Design
Cohesive Software Design
 
Ir 08
Ir   08Ir   08
Ir 08
 
G04124041046
G04124041046G04124041046
G04124041046
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic Relations
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
Proposed Method for String Transformation using Probablistic Approach
Proposed Method for String Transformation using Probablistic ApproachProposed Method for String Transformation using Probablistic Approach
Proposed Method for String Transformation using Probablistic Approach
 
Decision tables
Decision tablesDecision tables
Decision tables
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
 
Cmpe 255 Short Story Assignment
Cmpe 255 Short Story AssignmentCmpe 255 Short Story Assignment
Cmpe 255 Short Story Assignment
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
 
Ir 03
Ir   03Ir   03
Ir 03
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 

Similar to Information retrieval as statistical translation

Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Lifeng (Aaron) Han
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligencevini89
 
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...iyo
 
Seeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringSeeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringIJRES Journal
 
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...Measure the Similarity of Complaint Document Using Cosine Similarity Based on...
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...Editor IJCATR
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Edmond Lepedus
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...ijnlc
 
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...Lifeng (Aaron) Han
 
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...kevig
 
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERWITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERkevig
 
IRJET- K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
IRJET-  	  K-SVD: Dictionary Developing Algorithms for Sparse Representation ...IRJET-  	  K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
IRJET- K-SVD: Dictionary Developing Algorithms for Sparse Representation ...IRJET Journal
 
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERWITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERijnlc
 
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...csandit
 
International Journal of Computer Science and Security Volume (4) Issue (2)
International Journal of Computer Science and Security Volume (4) Issue (2)International Journal of Computer Science and Security Volume (4) Issue (2)
International Journal of Computer Science and Security Volume (4) Issue (2)CSCJournals
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)KU Leuven
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modelingHiroyuki Kuromiya
 
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...Lifeng (Aaron) Han
 

Similar to Information retrieval as statistical translation (20)

Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...
 
Ju3517011704
Ju3517011704Ju3517011704
Ju3517011704
 
Topicmodels
TopicmodelsTopicmodels
Topicmodels
 
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
Equirs: Explicitly Query Understanding Information Retrieval System Based on HmmEquirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
 
Seeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringSeeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text Clustering
 
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...Measure the Similarity of Complaint Document Using Cosine Similarity Based on...
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
 
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
 
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
 
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERWITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
 
IRJET- K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
IRJET-  	  K-SVD: Dictionary Developing Algorithms for Sparse Representation ...IRJET-  	  K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
IRJET- K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
 
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERWITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
 
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
 
International Journal of Computer Science and Security Volume (4) Issue (2)
International Journal of Computer Science and Security Volume (4) Issue (2)International Journal of Computer Science and Security Volume (4) Issue (2)
International Journal of Computer Science and Security Volume (4) Issue (2)
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
 
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
 

Recently uploaded

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Recently uploaded (20)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

Information retrieval as statistical translation

  • 1. Information Retrieval as Statistical Translation ADAM BERGER & JOHN LAFFERTY 1999 Bhavesh Singh 2010cs50281
  • 2. OUTLINE • • • • • • • • INTRODUCTION MODEL OF QUERY GENERATION PREVIOUS WORK USING 2-POISSON MODEL STATISTICAL TRANSLATION MODELS OF DOCUMENT-QUERY TRANSLATION WORD-FOR-WORD TRANSLATION EXPERIMENTAL RESULTS CRITIQUE
  • 3. INTRODUCTION • Information Retrieval (IR): Obtaining information resources relevant to an information need from a collection of information resources (documents). • Predicting relevance is the central goal of IR. • A new probabilistic approach to IR based upon the ideas and methods of statistical machine translation. • Model: Medium between data and understanding. • Ultimately, document retrieval systems must be sophisticated enough to handle polysemy and synonymy.
  • 4. INTRODUCTION (…cont.) SOME BASIC TERMINOLOGIES PRECISION is the fraction of the documents retrieved that are relevant to the user's information need. RECALL is the fraction of the documents that are relevant to the query that are successfully retrieved. There is a inverse relation between precision and recall.
  • 5. MODEL OF QUERY GENERATION • The user ‘U’ has an information need ‘I’ . • From this need, he generates an ideal document ‘d’. • Ideal Document: a perfect fit for the user, but almost certainly not present in the retrieval system’s collection of documents. • He selects a set of key terms from ‘d’, and generates a query ‘q’ from this set. In this setting, the task of a retrieval system is to find those documents most similar to ‘d’.
  • 6.
  • 7. The Retrieval System’s task To find the most likely documents given the query; that is, those ‘d’ for which p(d | q, U) is highest. By Bayes’ law – Denominator p(q | U) is fixed for a given query and user, we can ignore it for the purpose of ranking documents, and define the relevance of a document to a query as –
  • 8. 2-POISSON MODEL (PREVIOUS WORK) The 2-Poisson model is a mixture, that is a linear combination, of two Poisson distributions: Where Et – the Elite set of term t which occur more densely and non randomly in a few documents. In the context of IR, the 2-Poisson is used to model the probability distribution of the frequency X of a term in a collection of documents. The effectiveness of the Two-Poisson model for document retrieval was never tested, for two reasons. The first issue is that the learning of the three parameters using the Expectation Maximization (EM) algorithm for each term is expensive, and in general large collections contain millions of terms. The second problem is that the model does not take into account the document size, therefore the model should be extended to normalize different document lengths.
  • 9. STATISTICAL MACHINE TRANSLATION Automatic translation by computer was first contemplated by Warren Weaver when modern computers were in their infancy. The central problem of statistical MT is to build a system that automatically learns how to translate text, by inspecting a large set of sentences in one language along with their translations into another language. Let translational probability for each English word ‘e’ translating to each French word ‘f’ is given by : t( f | e).
  • 10. STATISTICAL MT (..cont.) The probability that an English sentence e = {e1, e2,…} translates to a French sentence f = {f1,f2,…} is calculated as where Gamma is a normalizing factor. The hidden variable in this model is the alignment a between the French and English words: aj = k means that the kth English word translates to the jth French word.
  • 11. MODEL OF DOCUMENT-QUERY TRANSLATION First, a word ‘w’ is chosen at random from the document d according to distribution l( w | d) that we call the document language model. Next translate ‘w’ into the word or phrase ‘q’ according to a translational model, with parameters t( q | w). Thus, the probability of choosing q as a representative of the document d is – Now assuming the sample size model ᶲ( n | d) as the Poisson distribution with mean lamda(d) as -
  • 12. MODEL OF DOCUMENT-QUERY TRANSLATION (…cont.) Under that assumption of treating the number of samples ‘n’ as Poisson distribution, the probability that a particular query q = q1,q2,…qm is generated will be given by – This was the Model 1 of document-query translation. It was inspired by IBM statistical translation model. To fit translational probabilities in Model 1, Expectation Maximization (EM) algorithm is used.
  • 13. MODEL 0 – THE SIMPLEST CASE: WORDFOR-WORD TRANSLATION The simplest version of the model 1 which we will distinguish as Model 0 is one where each word ‘w’ can be translated only as itself; that is, the translation probabilities are “diagonal”: Under this model, the query terms are chosen simply according to their frequency of occurrence in the document. The probability for query in this case is simply -
  • 14. EXPERIMENTAL RESULTS Precision-Recall plots. The left plot compares Model 1 to Model 0 on the SDR data. The right plot compares the same language model scored according to Model 0, demonstrating that the approximations are very good.
  • 15. CRITIQUE The 2-Poisson Model was never tested due to one of the reason that the learning of three parameters for each term is expensive because the Expectation Maximization algorithm converges in several iterations. According to this paper, to fit the translation probabilities of Model 1, EM algorithm is used. So this is also an expensive operation. The efficiency of EM in Model 1 is not discussed well. It should be more elaborated.
  • 16. REFERENCES [1] “Information Retrieval as Statistical Translation” by Adam Berger and John Lafferty, 1999. [2] “Two Poisson model” by Giambattista Amati, Fondazione ugo Bordoni. [3] Information Retrieval as Statistical Translation by Robert Barbey. [4] Wikipedia article on “Information Retrieval”.