SlideShare une entreprise Scribd logo
1  sur  43
NEURAL MODELS FOR
INFORMATION RETRIEVAL
BHASKAR MITRA
Principal Applied Scientist
Microsoft
Research Student
Dept. of Computer Science
University College London
NEURAL
NETWORKS
Amazingly successful on many difficult
application areas
Dominating multiple fields:
Each application is different, motivates
new innovations in machine learning
2011 2013 2015 2017
speech vision NLP IR?
Our research:
Novel representation learning methods and neural architectures
motivated by specific needs and challenges of IR tasks
TODAY’S TOPICS
Information Retrieval (IR) tasks
Representation learning for IR
Topic specific representations
Dealing with rare concepts/terms
This talk is based on work done in collaboration with
Nick Craswell, Fernando Diaz, Emine Yilmaz, Rich Caruana, and
Eric Nalisnick
Some of the content is from the manuscript under review for
Foundations and Trends® in Information Retrieval
Pre-print is available for free download
http://bit.ly/neuralir-intro
Final manuscript may contain additional content and changes
IR tasks
DOCUMENT RANKING
Information
need
query
results ranking (document list)
retrieval system indexes a
document corpus
Relevance
(documents satisfy
information need
e.g. useful)
OTHER IR TASKS
QUERY FORMULATION NEXT QUERY SUGGESTION
cheap flights from london t|
cheap flights from london to frankfurt
cheap flights from london to new york
cheap flights from london to miami
cheap flights from london to sydney
cheap flights from london to miami
Related searches
package deals to miami
ba flights to miami
things to do in miami
miami tourist attractions
ANATOMY OF
AN IR MODEL
IR in three simple steps:
1. Generate input (query or prefix)
representation
2. Generate candidate (document
or suffix or query) representation
3. Estimate relevance based on
input and candidate
representations
Neural networks can be useful for
one or more of these steps
EXAMPLE: CLASSICAL LEARNING TO
RANK (LTR) MODELS
1. Query and document
representation based on
manually crafted features
2. Neural network for matching
Representation
learning for IR
A QUICK REFRESHER ON
VECTOR SPACE
REPRESENTATIONS
Represent items by feature vectors – similar items have similar
vector representations
Corollary: the choice of features define what items are similar
An embedding is a new space such that the properties of, and
the relationships between, the items are preserved
Compared to original feature space an embedding space may
have one or more of the following:
• Less number of dimensions
• Less sparseness
• Disentangled principle components
NOTIONS OF
SIMILARITY
Is “Seattle” more similar to…
“Sydney” (similar type)
Or
“Seahawks” (similar topic)
Depends on what feature space you choose
DESIRED NOTION OF SIMILARITY SHOULD
DEPEND ON THE TARGET TASK
DOCUMENT RANKING
✓ budget flights to london
✗ cheap flights to sydney
✗ hotels in london
QUERY FORMULATION
✓ cheap flights to london
✓ cheap flights to sydney
✗ cheap flights to big ben
NEXT QUERY SUGGESTION
✓ budget flights to london
✓ hotels in london
✗ cheap flights to sydney
cheap flights to london cheap flights to | cheap flights to london
Next, we take a sample model and show how the same model captures
different notions of similarity based on the data it is trained on
DEEP SEMANTIC SIMILARITY MODEL
(DSSM)
Siamese network with two deep sub-models
Projects input and candidate texts into
embedding space
Trained by maximizing cosine similarity between
correct input-output pairs
SAME MODEL (DSSM), DIFFERENT
TRAINING DATA
(Shen et al., 2014)
https://dl.acm.org/citation...
(Mitra and Craswell, 2015)
https://dl.acm.org/citation...
DSSM trained on query-document pairs DSSM trained on query prefix-suffix pairs
Nearest neighbors for “seattle” and “taylor swift” based on two DSSM
models trained on different types training data
SAME DSSM, TRAINED ON SESSION QUERY PAIRS
Can capture regularities in the query space
(similar to word2vec for terms)
(Mitra, 2015)
https://dl.acm.org/citation...
Groups of similar search intent
transitions from a query log
SAME DSSM, TRAINED ON SESSION QUERY PAIRS
Allows for analogy-like vector algebra over short text!
“
”
WHEN IS IT PARTICULARLY IMPORTANT TO
THINK ABOUT NOTIONS OF SIMILARITY?
If you are using pre-trained embeddings, instead of learning them in an
end-to-end model for the target task because not enough training data is
available for fully supervised learning of representations
USING PRE-TRAINED WORD EMBEDDINGS
FOR DOCUMENT RANKING
Traditional IR models count number of query term
matches in document
Non-matching terms contain important evidence of
relevance
We can leverage term embeddings to recognize that
document terms “population” and “area” indicates
relevance of document to the query “Albuquerque”
Passage about Albuquerque
Passage not about Albuquerque
USING WORD2VEC FOR SOFT-MATCHING
Compare every query and document term
in the embedding space
What if I told you that everyone using
word2vec is throwing half the model away?
DUAL EMBEDDING SPACE MODEL (DESM)
IN-OUT similarity captures a more Topical notion of term-term relationship compared to IN-IN and OUT-OUT
Better to represent query terms using IN embeddings and document terms using OUT embeddings
(Mitra et al., 2016)
https://arxiv.org/abs/1602.01137
IN-OUT VS. IN-IN
GET THE DATA
IN+OUT Embeddings for 2.7M words trained on 600M+ Bing queries
https://www.microsoft.com/en-us/download/details.aspx?id=52597
Download
Topic specific
representations
WHY IS TOPIC-SPECIFIC TERM
REPRESENTATION USEFUL?
Terms can take different meanings in different
context – global representations likely to be coarse
and inappropriate under certain topics
Global model likely to focus more on learning
accurate representations of popular terms
Often impractical to train on full corpus – without
topic specific sampling important training instances
may be ignored
TOPIC-SPECIFIC TERM
EMBEDDINGS FOR
QUERY EXPANSION
corpuscut gasoline tax
results
topic-specific
term embeddings
cut gasoline tax deficit budget
expanded query
final results
query
(Diaz et al., 2015)
http://anthology.aclweb.org/...
Use documents from first round of
retrieval to learn a query-specific
embedding space
Use learnt embeddings to find related
terms for query expansion for second
round of retrieval
global
local
“
”
THE IDEA OF LEARNING (OR UPDATING)
REPRESENTATIONS AT RUNTIME MAY BE
APPLICABLE IN THE CONTEXT OF AI APPROACHES
TO SOLVING OTHER MORE COMPLEX TASKS
For IR, in practice…
We can pre-train k (≈ millions of) different embedding spaces and pick
the most appropriate representation at runtime without re-training
Dealing with
rare concepts
and terms
A TALE OF TWO QUERIES
Query: “pekarovic land company”
Hard to learn good representation for
rare term pekarovic
But easy to estimate relevance based
on patterns of exact matches
Proposal: Learn a neural model to
estimate relevance from patterns of
exact matches
Query: “what channel seahawks on today”
Target document likely contains ESPN
or sky sports instead of channel
An embedding model can associate
ESPN in document to channel in query
Proposal: Learn embeddings of text
and match query with document in
the embedding space
THE DUET
ARCHITECTURE
Linear combination of two models trained
jointly on labelled query-document pairs
Local model operates on lexical interaction
matrix, while Distributed model projects text
into embedding space and estimates match
Sum
Query text
Generate query
term vector
Doc text
Generate doc
term vector
Generate interaction matrix
Query
term vector
Doc
term vector
Local model
Fully connected layers for matching
Query text
Generate query
embedding
Doc text
Generate doc
embedding
Hadamard product
Query
embedding
Doc
embedding
Distributed model
Fully connected layers for matching
(Mitra et al., 2017)
https://dl.acm.org/citation...
(Nanni et al., 2017)
https://dl.acm.org/citation...
TERM IMPORTANCE
LOCAL MODEL DISTRIBUTED MODEL
Query: united states president
LOCAL MODEL: TERM INTERACTION MATRIX
𝑋𝑖,𝑗 =
1, 𝑖𝑓 𝑞𝑖 = 𝑑𝑗
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
In relevant documents,
→Many matches, typically clustered
→Matches localized early in
document
→Matches for all query terms
→In-order (phrasal) matches
LOCAL MODEL: ESTIMATING RELEVANCE
← document words →
Convolve using window of size 𝑛 𝑑 × 1
Each window instance compares a query term w/
whole document
Fully connected layers aggregate evidence
across query terms - can model phrasal matches
convolutio
n
pooling
Query
embedding
…
…
…
HadamardproductHadamardproductFullyconnected
query document
DISTRIBUTED MODEL: ESTIMATING RELEVANCE
Convolve over query and
document terms
Match query with moving
windows over document
Learn text embeddings
specifically for the task
Matching happens in
embedding space
* Network architecture slightly
simplified for visualization – refer paper
for exact details
STRONG PERFORMANCE ON DOCUMENT AND
ANSWER RANKING TASKS
EFFECT OF TRAINING DATA VOLUME
Key finding: large quantity of training data necessary for learning good
representations, less impactful for training local model
If we classify models by
query level performance
there is a clear clustering of
lexical (local) and semantic
(distributed) models
GET THE CODE
Implemented using CNTK python API
https://github.com/bmitra-msft/NDRM/blob/master/notebooks/Duet.ipynb
Download
Summary
RECAP
When using pre-trained embeddings
consider whether the relationship between
items in the embedding space is
appropriate for the target task
If a representation can be learnt (or
updated) based on the current input /
context it is likely to outperform a global
representation
It is difficult to learn good representations
for rare items, but important to incorporate
them in the modeling
While these insights are
grounded in IR tasks, they
should be generally
applicable to other NLP and
machine learning scenarios
LIST OF PAPERS DISCUSSED
An Introduction to Neural Information Retrieval
Bhaskar Mitra and Nick Craswell, in Foundations and Trends® in Information Retrieval, Now Publishers, 2017 (upcoming).
https://www.microsoft.com/en-us/research/publication/...
Learning to Match Using Local and Distributed Representations of Text for Web Search
Bhaskar Mitra, Fernando Diaz, and Nick Craswell, in Proc. WWW, 2017.
https://dl.acm.org/citation.cfm?id=3052579
Benchmark for Complex Answer Retrieval
Federico Nanni, Bhaskar Mitra, Matt Magnusson, and Laura Dietz, in Proc. ICTIR, 2017.
https://dl.acm.org/citation.cfm?id=3121099
Query Expansion with Locally-Trained Word Embeddings
Fernando Diaz, Bhaskar Mitra, and Nick Craswell, in Proc. ACL, 2016
http://anthology.aclweb.org/P/P16/P16-1035.pdf
A Dual Embedding Space Model for Document Ranking
Bhaskar Mitra, Eric Nalisnick, Nick Craswell, and Rich Caruana, arXiv preprint, 2016
https://arxiv.org/abs/1602.01137
Improving Document Ranking with Dual Word Embeddings
Eric Nalisnick, Bhaskar Mitra, Nick Craswell, and Rich Caruana, in Proc. WWW, 2016
https://dl.acm.org/citation.cfm?id=2889361
Query Auto-Completion for Rare Prefixes
Bhaskar Mitra and Nick Craswell, in Proc. CIKM, 2015
https://dl.acm.org/citation.cfm?id=2806599
Exploring Session Context using Distributed Representations of Queries and Reformulations
Bhaskar Mitra, in Proc. SIGIR, 2015
https://dl.acm.org/citation.cfm?id=2766462.2767702
THANK YOU

Contenu connexe

Tendances

Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
Lect6-An introduction to ontologies and ontology development
Lect6-An introduction to ontologies and ontology developmentLect6-An introduction to ontologies and ontology development
Lect6-An introduction to ontologies and ontology developmentAntonio Moreno
 
Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates...
Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates...Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates...
Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates...SlideTeam
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly DetectionKenneth Graham
 
Adversarial Attacks and Defense
Adversarial Attacks and DefenseAdversarial Attacks and Defense
Adversarial Attacks and DefenseKishor Datta Gupta
 
Anomaly Detection Using Machine Learning In Industrial IoT
Anomaly Detection Using Machine Learning In Industrial IoTAnomaly Detection Using Machine Learning In Industrial IoT
Anomaly Detection Using Machine Learning In Industrial IoTDigital Vidya
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningShao-Chuan Wang
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment AnalysisRupak Roy
 
Feature selection
Feature selectionFeature selection
Feature selectiondkpawar
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsankit_ppt
 
Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfssuser4c50a9
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalitiesKrish_ver2
 

Tendances (20)

Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Text mining
Text miningText mining
Text mining
 
Lect6-An introduction to ontologies and ontology development
Lect6-An introduction to ontologies and ontology developmentLect6-An introduction to ontologies and ontology development
Lect6-An introduction to ontologies and ontology development
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates...
Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates...Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates...
Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates...
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly Detection
 
Adversarial Attacks and Defense
Adversarial Attacks and DefenseAdversarial Attacks and Defense
Adversarial Attacks and Defense
 
Anomaly Detection Using Machine Learning In Industrial IoT
Anomaly Detection Using Machine Learning In Industrial IoTAnomaly Detection Using Machine Learning In Industrial IoT
Anomaly Detection Using Machine Learning In Industrial IoT
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Kdd process
Kdd processKdd process
Kdd process
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topics
 
Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdf
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 

Similaire à Neural Models for Information Retrieval

Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document RankingBhaskar Mitra
 
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...Amit Sheth
 
Reflective Teaching Essay
Reflective Teaching EssayReflective Teaching Essay
Reflective Teaching EssayLisa Williams
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
 
Search powered by deep learning smart data 2017
Search powered by deep learning smart data 2017Search powered by deep learning smart data 2017
Search powered by deep learning smart data 2017Debanjan Mahata
 
Search Powered by Deep Learning SmartData 2017
Search Powered by Deep Learning SmartData 2017Search Powered by Deep Learning SmartData 2017
Search Powered by Deep Learning SmartData 2017Debanjan Mahata
 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLDODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLTakeshi Morita
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 
LLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureAggregage
 
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesExplanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesDaniel Sonntag
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesMax Irwin
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Ay3313861388
Ay3313861388Ay3313861388
Ay3313861388IJMER
 
Natural Language Understanding of Systems Engineering Artifacts
Natural Language Understanding of Systems Engineering ArtifactsNatural Language Understanding of Systems Engineering Artifacts
Natural Language Understanding of Systems Engineering ArtifactsÁkos Horváth
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application ModelsMarco Brambilla
 

Similaire à Neural Models for Information Retrieval (20)

The Duet model
The Duet modelThe Duet model
The Duet model
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
 
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...
 
Reflective Teaching Essay
Reflective Teaching EssayReflective Teaching Essay
Reflective Teaching Essay
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 
EDI 2009- Advanced Search: What’s Under the Hood of your Favorite Search System?
EDI 2009- Advanced Search: What’s Under the Hood of your Favorite Search System?EDI 2009- Advanced Search: What’s Under the Hood of your Favorite Search System?
EDI 2009- Advanced Search: What’s Under the Hood of your Favorite Search System?
 
Search powered by deep learning smart data 2017
Search powered by deep learning smart data 2017Search powered by deep learning smart data 2017
Search powered by deep learning smart data 2017
 
Search Powered by Deep Learning SmartData 2017
Search Powered by Deep Learning SmartData 2017Search Powered by Deep Learning SmartData 2017
Search Powered by Deep Learning SmartData 2017
 
G04124041046
G04124041046G04124041046
G04124041046
 
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWLDODDLE-OWL: A Domain Ontology Construction Tool with OWL
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
LLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
 
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesExplanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and VocabulariesHaystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Ay3313861388
Ay3313861388Ay3313861388
Ay3313861388
 
Natural Language Understanding of Systems Engineering Artifacts
Natural Language Understanding of Systems Engineering ArtifactsNatural Language Understanding of Systems Engineering Artifacts
Natural Language Understanding of Systems Engineering Artifacts
 
Searching Repositories of Web Application Models
Searching Repositories of Web Application ModelsSearching Repositories of Web Application Models
Searching Repositories of Web Application Models
 
Haystacks slides
Haystacks slidesHaystacks slides
Haystacks slides
 

Plus de Bhaskar Mitra

Joint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and RecommendationJoint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and RecommendationBhaskar Mitra
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?Bhaskar Mitra
 
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...Bhaskar Mitra
 
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Bhaskar Mitra
 
Multisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationMultisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationBhaskar Mitra
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Neural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressNeural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressBhaskar Mitra
 
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackConformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackBhaskar Mitra
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Duet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackDuet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackBhaskar Mitra
 
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBenchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBhaskar Mitra
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Learning to Rank with Neural Networks
Learning to Rank with Neural NetworksLearning to Rank with Neural Networks
Learning to Rank with Neural NetworksBhaskar Mitra
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Bhaskar Mitra
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalBhaskar Mitra
 

Plus de Bhaskar Mitra (20)

Joint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and RecommendationJoint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and Recommendation
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?
 
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
 
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...
 
Multisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationMultisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and Recommendation
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Neural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressNeural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progress
 
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackConformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Duet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackDuet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning Track
 
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBenchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Learning to Rank with Neural Networks
Learning to Rank with Neural NetworksLearning to Rank with Neural Networks
Learning to Rank with Neural Networks
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrieval
 

Dernier

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Dernier (20)

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Neural Models for Information Retrieval

  • 1. NEURAL MODELS FOR INFORMATION RETRIEVAL BHASKAR MITRA Principal Applied Scientist Microsoft Research Student Dept. of Computer Science University College London
  • 2. NEURAL NETWORKS Amazingly successful on many difficult application areas Dominating multiple fields: Each application is different, motivates new innovations in machine learning 2011 2013 2015 2017 speech vision NLP IR? Our research: Novel representation learning methods and neural architectures motivated by specific needs and challenges of IR tasks
  • 3. TODAY’S TOPICS Information Retrieval (IR) tasks Representation learning for IR Topic specific representations Dealing with rare concepts/terms
  • 4. This talk is based on work done in collaboration with Nick Craswell, Fernando Diaz, Emine Yilmaz, Rich Caruana, and Eric Nalisnick Some of the content is from the manuscript under review for Foundations and Trends® in Information Retrieval Pre-print is available for free download http://bit.ly/neuralir-intro Final manuscript may contain additional content and changes
  • 6. DOCUMENT RANKING Information need query results ranking (document list) retrieval system indexes a document corpus Relevance (documents satisfy information need e.g. useful)
  • 7. OTHER IR TASKS QUERY FORMULATION NEXT QUERY SUGGESTION cheap flights from london t| cheap flights from london to frankfurt cheap flights from london to new york cheap flights from london to miami cheap flights from london to sydney cheap flights from london to miami Related searches package deals to miami ba flights to miami things to do in miami miami tourist attractions
  • 8. ANATOMY OF AN IR MODEL IR in three simple steps: 1. Generate input (query or prefix) representation 2. Generate candidate (document or suffix or query) representation 3. Estimate relevance based on input and candidate representations Neural networks can be useful for one or more of these steps
  • 9. EXAMPLE: CLASSICAL LEARNING TO RANK (LTR) MODELS 1. Query and document representation based on manually crafted features 2. Neural network for matching
  • 11. A QUICK REFRESHER ON VECTOR SPACE REPRESENTATIONS Represent items by feature vectors – similar items have similar vector representations Corollary: the choice of features define what items are similar An embedding is a new space such that the properties of, and the relationships between, the items are preserved Compared to original feature space an embedding space may have one or more of the following: • Less number of dimensions • Less sparseness • Disentangled principle components
  • 12. NOTIONS OF SIMILARITY Is “Seattle” more similar to… “Sydney” (similar type) Or “Seahawks” (similar topic) Depends on what feature space you choose
  • 13. DESIRED NOTION OF SIMILARITY SHOULD DEPEND ON THE TARGET TASK DOCUMENT RANKING ✓ budget flights to london ✗ cheap flights to sydney ✗ hotels in london QUERY FORMULATION ✓ cheap flights to london ✓ cheap flights to sydney ✗ cheap flights to big ben NEXT QUERY SUGGESTION ✓ budget flights to london ✓ hotels in london ✗ cheap flights to sydney cheap flights to london cheap flights to | cheap flights to london Next, we take a sample model and show how the same model captures different notions of similarity based on the data it is trained on
  • 14. DEEP SEMANTIC SIMILARITY MODEL (DSSM) Siamese network with two deep sub-models Projects input and candidate texts into embedding space Trained by maximizing cosine similarity between correct input-output pairs
  • 15. SAME MODEL (DSSM), DIFFERENT TRAINING DATA (Shen et al., 2014) https://dl.acm.org/citation... (Mitra and Craswell, 2015) https://dl.acm.org/citation... DSSM trained on query-document pairs DSSM trained on query prefix-suffix pairs Nearest neighbors for “seattle” and “taylor swift” based on two DSSM models trained on different types training data
  • 16. SAME DSSM, TRAINED ON SESSION QUERY PAIRS Can capture regularities in the query space (similar to word2vec for terms) (Mitra, 2015) https://dl.acm.org/citation... Groups of similar search intent transitions from a query log
  • 17. SAME DSSM, TRAINED ON SESSION QUERY PAIRS Allows for analogy-like vector algebra over short text!
  • 18. “ ” WHEN IS IT PARTICULARLY IMPORTANT TO THINK ABOUT NOTIONS OF SIMILARITY? If you are using pre-trained embeddings, instead of learning them in an end-to-end model for the target task because not enough training data is available for fully supervised learning of representations
  • 19. USING PRE-TRAINED WORD EMBEDDINGS FOR DOCUMENT RANKING Traditional IR models count number of query term matches in document Non-matching terms contain important evidence of relevance We can leverage term embeddings to recognize that document terms “population” and “area” indicates relevance of document to the query “Albuquerque” Passage about Albuquerque Passage not about Albuquerque
  • 20. USING WORD2VEC FOR SOFT-MATCHING Compare every query and document term in the embedding space What if I told you that everyone using word2vec is throwing half the model away?
  • 21. DUAL EMBEDDING SPACE MODEL (DESM) IN-OUT similarity captures a more Topical notion of term-term relationship compared to IN-IN and OUT-OUT Better to represent query terms using IN embeddings and document terms using OUT embeddings (Mitra et al., 2016) https://arxiv.org/abs/1602.01137
  • 23. GET THE DATA IN+OUT Embeddings for 2.7M words trained on 600M+ Bing queries https://www.microsoft.com/en-us/download/details.aspx?id=52597 Download
  • 25. WHY IS TOPIC-SPECIFIC TERM REPRESENTATION USEFUL? Terms can take different meanings in different context – global representations likely to be coarse and inappropriate under certain topics Global model likely to focus more on learning accurate representations of popular terms Often impractical to train on full corpus – without topic specific sampling important training instances may be ignored
  • 26. TOPIC-SPECIFIC TERM EMBEDDINGS FOR QUERY EXPANSION corpuscut gasoline tax results topic-specific term embeddings cut gasoline tax deficit budget expanded query final results query (Diaz et al., 2015) http://anthology.aclweb.org/... Use documents from first round of retrieval to learn a query-specific embedding space Use learnt embeddings to find related terms for query expansion for second round of retrieval
  • 28. “ ” THE IDEA OF LEARNING (OR UPDATING) REPRESENTATIONS AT RUNTIME MAY BE APPLICABLE IN THE CONTEXT OF AI APPROACHES TO SOLVING OTHER MORE COMPLEX TASKS For IR, in practice… We can pre-train k (≈ millions of) different embedding spaces and pick the most appropriate representation at runtime without re-training
  • 30. A TALE OF TWO QUERIES Query: “pekarovic land company” Hard to learn good representation for rare term pekarovic But easy to estimate relevance based on patterns of exact matches Proposal: Learn a neural model to estimate relevance from patterns of exact matches Query: “what channel seahawks on today” Target document likely contains ESPN or sky sports instead of channel An embedding model can associate ESPN in document to channel in query Proposal: Learn embeddings of text and match query with document in the embedding space
  • 31. THE DUET ARCHITECTURE Linear combination of two models trained jointly on labelled query-document pairs Local model operates on lexical interaction matrix, while Distributed model projects text into embedding space and estimates match Sum Query text Generate query term vector Doc text Generate doc term vector Generate interaction matrix Query term vector Doc term vector Local model Fully connected layers for matching Query text Generate query embedding Doc text Generate doc embedding Hadamard product Query embedding Doc embedding Distributed model Fully connected layers for matching (Mitra et al., 2017) https://dl.acm.org/citation... (Nanni et al., 2017) https://dl.acm.org/citation...
  • 32. TERM IMPORTANCE LOCAL MODEL DISTRIBUTED MODEL Query: united states president
  • 33. LOCAL MODEL: TERM INTERACTION MATRIX 𝑋𝑖,𝑗 = 1, 𝑖𝑓 𝑞𝑖 = 𝑑𝑗 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 In relevant documents, →Many matches, typically clustered →Matches localized early in document →Matches for all query terms →In-order (phrasal) matches
  • 34. LOCAL MODEL: ESTIMATING RELEVANCE ← document words → Convolve using window of size 𝑛 𝑑 × 1 Each window instance compares a query term w/ whole document Fully connected layers aggregate evidence across query terms - can model phrasal matches
  • 35. convolutio n pooling Query embedding … … … HadamardproductHadamardproductFullyconnected query document DISTRIBUTED MODEL: ESTIMATING RELEVANCE Convolve over query and document terms Match query with moving windows over document Learn text embeddings specifically for the task Matching happens in embedding space * Network architecture slightly simplified for visualization – refer paper for exact details
  • 36. STRONG PERFORMANCE ON DOCUMENT AND ANSWER RANKING TASKS
  • 37. EFFECT OF TRAINING DATA VOLUME Key finding: large quantity of training data necessary for learning good representations, less impactful for training local model
  • 38. If we classify models by query level performance there is a clear clustering of lexical (local) and semantic (distributed) models
  • 39. GET THE CODE Implemented using CNTK python API https://github.com/bmitra-msft/NDRM/blob/master/notebooks/Duet.ipynb Download
  • 41. RECAP When using pre-trained embeddings consider whether the relationship between items in the embedding space is appropriate for the target task If a representation can be learnt (or updated) based on the current input / context it is likely to outperform a global representation It is difficult to learn good representations for rare items, but important to incorporate them in the modeling While these insights are grounded in IR tasks, they should be generally applicable to other NLP and machine learning scenarios
  • 42. LIST OF PAPERS DISCUSSED An Introduction to Neural Information Retrieval Bhaskar Mitra and Nick Craswell, in Foundations and Trends® in Information Retrieval, Now Publishers, 2017 (upcoming). https://www.microsoft.com/en-us/research/publication/... Learning to Match Using Local and Distributed Representations of Text for Web Search Bhaskar Mitra, Fernando Diaz, and Nick Craswell, in Proc. WWW, 2017. https://dl.acm.org/citation.cfm?id=3052579 Benchmark for Complex Answer Retrieval Federico Nanni, Bhaskar Mitra, Matt Magnusson, and Laura Dietz, in Proc. ICTIR, 2017. https://dl.acm.org/citation.cfm?id=3121099 Query Expansion with Locally-Trained Word Embeddings Fernando Diaz, Bhaskar Mitra, and Nick Craswell, in Proc. ACL, 2016 http://anthology.aclweb.org/P/P16/P16-1035.pdf A Dual Embedding Space Model for Document Ranking Bhaskar Mitra, Eric Nalisnick, Nick Craswell, and Rich Caruana, arXiv preprint, 2016 https://arxiv.org/abs/1602.01137 Improving Document Ranking with Dual Word Embeddings Eric Nalisnick, Bhaskar Mitra, Nick Craswell, and Rich Caruana, in Proc. WWW, 2016 https://dl.acm.org/citation.cfm?id=2889361 Query Auto-Completion for Rare Prefixes Bhaskar Mitra and Nick Craswell, in Proc. CIKM, 2015 https://dl.acm.org/citation.cfm?id=2806599 Exploring Session Context using Distributed Representations of Queries and Reformulations Bhaskar Mitra, in Proc. SIGIR, 2015 https://dl.acm.org/citation.cfm?id=2766462.2767702