SlideShare une entreprise Scribd logo
Improving search with
neural ranking methods
Andrew Yates
Assistant Professor, UvA
@andrewyates
About me
Assistant Professor at University of Amsterdam (2021-)
• Information retrieval, neural networks, natural language processing,
(consumer) medical domain, personal knowledge graphs, …
Previously
• Senior Researcher (2018-2021) at Max Planck Institute for Informatics;
Postdoc at MPII before that (2016-2018)
• PhD in Computer Science from Georgetown University, Washington, DC
• BSc in Computer Science from Illinois Institute of Technology, Chicago
2
3
Book on Neural IR
(link on final slide)
Podcast on Neural IR
Available: Spotify, iTunes, …
4
Students and collaborators: Siddhant Arora, Arman Cohan, Jeffrey Dalton, Doug Downey, Sergey Feldman,
Nazli Goharian, Kevin Martin Jose, Jimmy Lin, Sean MacAvaney, Thong Nguyen, Wei Yang, Xinyu Zhang, …
Capreolus: A Toolkit for End-to-End
Neural Ad Hoc Retrieval
Outline
• About me
• What is NIR?
• The good, the bad, and the ugly
5
Neural? IR?
Focus on ad hoc retrieval, better known as “googling something”
➔ We all do this a lot, and it works incredibly well
➔ Given an arbitrary query, return ranked list of relevant “documents”
6
query
black bear attacks
...
collection
1.
2.
3.
metric:
0.66
ranking
method
Retrieval 101
• Crawl & index documents (text)
• Query time: look for exact matches in documents,
optionally combining with other signals (e.g., spam score)
7
Retrieval 101
• Consider query terms that appear in document
• Weight them by discriminativeness (inverse document frequency)
and importance (term frequency)
8
IDF component TF component
term saturation length
normalization
Consider only terms in
both query & document
Retrieval 101
9
• Consider query terms that appear in document
• Weight them by discriminativeness (inverse document frequency)
and importance (term frequency)
General formula
Neural retrieval?
Neural refers to a type of machine learning method,
which bring two key improvements
10
Improvement #1: soft matching of terms
Exact matching of
Q and D terms
Non-neural approaches: translation models, topic models like pLSI, ...
11
Inverse Document
Frequency of query term
(collection stat)
Term Frequency of
q in document D
(document stat)
Improvement #2: less handcrafting
(learn how to score)
12
Inverse Document
Frequency of query term
Term Frequency of
q in document D
N = # docs and ni
= # docs with term qi
IDF defined as:
• log N/ni
• log (1 + N/ni
)
• log (N- ni
)/ni
• …
13
Inverse Document
Frequency of query term
Term Frequency of
q in document D
TF defined as:
• TF(qi
, D)
• 1 + log TF(qi
, D)
• 0.5 + 0.5 [TF(qi
, D) / max TF(D)]
• …
14
Soft matching
Key point: neural methods replace exact matching with semantic matching
With traditional methods, soft matching is possible (but it’s less effective)
● Stemming handles matches like singular vs. plural and different tenses
swam, swimming, swim, swims replaced with swim
● Query expansion approaches handle context by adding related terms to query
Query contains bank, referring to turning a plane
Add context terms to query: flight, airplane, ailerons, spoilers, ...
15
Soft matching via embeddings
16
Source: https://medium.com/@hari4om/word-embedding-d816f643140
Neural approaches
represent words
as learned vectors
Soft matching via embeddings
17
Source: https://medium.com/@hari4om/word-embedding-d816f643140
Vector “embedding” allows comparisons of different terms
Soft matching via embeddings
18
Source: https://towardsdatascience.com/deep-learning-4-embedding-layers-f9a02d55ac12
Embeddings capture a general notion of semantic similarity
… … … … … … … …
…
[SEP]
EA
P8
E[CLS]
T[CLS]
[CLS]
E1
T1
A1
E2
T2
A2
E3
T3
A3
E4
T4
[SEP]
E5
T5
B1
E6
T6
B2
E7
T7
E[SEP]
T[SEP]
[CLS] tThe
tbank
tmakes
tloans
t[MASK]
tclients
EA
EA
EA
EA
EA
EA
EA
P0
P1
P2
P3
P4
P5
P6
+ + + + + + + +
+ + + + + + + +
Token
Embeddings
Segment
Embeddings
Position
Embeddings
t.
EA
P7
+
+
The bank makes loans [MASK] clients .
Loss = -log (P("to" | masked input))
Random
masking
19
D ྾ V
softmax
Embeddings can be learned automatically
Randomly-
initialized
Large
Language
Model
(>100 million
weights)
Devlin, Chang, Lee,
Toutanova. BERT: Pre-training
of Deep Bidirectional
Transformers for Language
Understanding. NAACL 2019.
“BERT”
… … … … … … … …
…
[SEP]
EA
P8
E[CLS]
T[CLS]
[CLS]
E1
T1
A1
E2
T2
A2
E3
T3
A3
E4
T4
[SEP]
E5
T5
B1
E6
T6
B2
E7
T7
E[SEP]
T[SEP]
[CLS] tThe
tbank
tmakes
tloans
t[MASK]
tclients
EA
EA
EA
EA
EA
EA
EA
P0
P1
P2
P3
P4
P5
P6
+ + + + + + + +
+ + + + + + + +
Token
Embeddings
Segment
Embeddings
Position
Embeddings
t.
EA
P7
+
+
The bank makes loans [MASK] clients .
Loss = -log (P("to" | masked input))
Random
masking
20
D ྾ V
softmax
Embeddings can be learned automatically
Advantages
● Automatically learn semantic representations of words!
● These embeddings capture general similarity
Disadvantages
● These embeddings capture general similarity
Randomly-
initialized
Large
Language
Model
(>100 million
weights)
Neural IR: the good, the bad, and the ugly
• The good
• Big improvements in search quality
• Key ingredient: putting words in context
• Nice property: zero-shot ranking works
• The bad
• The ugly
21
source
We’re making a significant improvement to how we
understand queries, representing the biggest leap
forward in the past five years, and one of the
biggest leaps forward in the history of Search.
Starting from April of this year (2019), we used
large transformer models to deliver the largest
quality improvements to our Bing customers in
the past year.
MS Bing
Google Search
source
22
Search quality improvements: industry
Source: Yang et al., SIGIR 2019
Yilmaz et al., 2019;
MacAvaney et al. 2019
Li et al., 2020;
Nogueira et al., 2020
23
Robust04 Dataset (news articles)
Search quality improvements: academia
~8 points!
Search quality improvements: academia
24
MS MARCO Passage Dataset (Web pages)
Source: MS MARCO Leaderboard
19 point improvement: BM25 vs. best neural
8 point improvement: neural pre-LLM vs. LLM
Neural IR: the good, the bad, and the ugly
• The good
• Big improvements in search quality
• Key ingredient: putting words in context
• Nice property: zero-shot ranking works
• The bad
• The ugly
25
Key ingredient: contextualization
Modern approaches use a LLM (BERT) to create contextualized embeddings
26
Relevant
Document
Nonrelevant
MacAvaney, Yates, Cohan, Goharian. CEDR: Contextualized Embeddings for Document Ranking. SIGIR 2019.
Key ingredient: contextualization
Score can be produced by placing document terms in similarity buckets,
then computing score based on the size of each bucket
27
sim = 1.0 0.6 < sim < 1.0 sim <= 0.6
Tesla 4 terms 3 terms 20 terms
net 1 term 8 terms 40 terms
worth 2 terms 2 terms 15 terms
MacAvaney, Yates, Cohan, Goharian. CEDR: Contextualized Embeddings for Document Ranking. SIGIR 2019.
Guo, Fan, Ai, Croft. A Deep Relevance Matching Model for Ad-hoc Retrieval. CIKM 2016.
Query
Terms
Similarity Buckets
Score(Tesla) = 4*a + 3*b + 20*c
Total score = Score(Tesla) + Score(net) + Score(worth)
Neural IR: the good, the bad, and the ugly
• The good
• Big improvements in search quality
• Key ingredient: putting words in context
• Nice property: zero-shot ranking works
• The bad
• The ugly
28
Zero-shot ranking
Zero-shot: “this is my first attempt at a task”
➔ We typically think of search as zero-shot
If Apple releases iDevice tomorrow, can immediately search for it
➔ In experiments, operationalized as “this is my first attempt at this dataset”
29
A Little Bit Is Worse Than None: Ranking with Limited Training Data. Zhang, Yates, Lin. SustaiNLP 2020.
30
Zero-shot ranking
A Little Bit Is Worse Than None: Ranking with Limited Training Data. Zhang, Yates, Lin. SustaiNLP 2020.
No improvement training with
up to 50% of in-domain data
Neural method
(zero-shot)
Traditional methods
Neural IR: the good, the bad, and the ugly
• The good
• The bad
• Which approach is best? Depends on what we measure
• Zero-shot effectiveness varies a lot
• The ugly
31
Many categories of queries are ‘solved’ by current systems
121171: define etruscans
1113256: what is reba mcentire's net
worth
701453: what is a statutory deed
Basic definition
Simple factoid QA
Easy description
What should we measure?
How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset. Mackie, Dalton, Yates. SIGIR 2021.
But others are more difficult
1037798: who is robert gray
10486: cost of interior concrete flooring
Ambiguous
Ambiguous and localized
144862: did prohibition increased crime
Complex reasoning
507445: symptoms of different types of brain
bleeds
Concerns multiple entities
What should we measure?
How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset. Mackie, Dalton, Yates. SIGIR 2021.
144862: did prohibition increased crime
Complex reasoning
507445: symptoms of different types of brain
bleeds
Concerns multiple entities
Hard + Interesting
How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset. Mackie, Dalton, Yates. SIGIR 2021.
Criteria
• Non-Factoid: Not answerable by a single short answer
• Beyond single passage: More than a simple definition
• Answerable: Answerable solely from the provided context
• Text-focused: Non-text answers or calculations should be excluded
• Mostly well-formed: Not contain clear spelling or language errors
• Possibly Complex: Multiple entities, comparison, complex reasoning, or multiple answers
Entity Linking
Query
Annotations
Web Search
Engine Annotations
TREC Deep Learning Collection
Queries Runs
What makes a query difficult & interesting?
41%
Factoid 10%
List
18%
Long Answer
24%
Web Search
8%
Factoid 31%
List
27%
Long Answer
35%
Web Search
TREC Deep Learning Collection DL-HARD Collection
The query type matters
How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset. Mackie, Dalton, Yates. SIGIR 2021.
41%
Factoid 10%
List
18%
Long Answer
24%
Web Search
8%
Factoid 31%
List
27%
Long Answer
35%
Web Search
TREC Deep Learning Collection DL-HARD Collection
The query type matters
How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset. Mackie, Dalton, Yates. SIGIR 2021.
Swaps in system rankings
● rank 1 → rank 4
● rank 13 → rank 5
● bottom → bottom + 1
● etc.
Neural IR: the good, the bad, and the ugly
• The good
• The bad
• Which approach is best? Depends on what we measure
• Zero-shot effectiveness varies a lot
• The ugly
38
Zero-shot: reranking > dense retrieval
39
Reranking approaches compare individual query and document terms
40
s
Zero-shot: reranking > dense retrieval
Instead of comparing terms, entire query/doc represented by single vector
Neural IR: the good, the bad, and the ugly
• The good
• The bad
• The ugly
• General similarity: prone to capturing bias and prone to
using shortcuts (which do not generalize)
• Little control over what knowledge is stored inside the neural model
41
42
Term gender doctor who women dr medical her pregnancy … birth baby number
Weight 112 96 71 67 60 55 52 43 … 28 24 7
What’s inside a dense representation?
query
Tesla’s net worth
0.0
0.2
0.3
2.5
1.7
0.8
query vector
Converter
term weights
tesla
worth
net
price
car
…
…
…
32
23
20
18
11
Query: what is the gender of that doctor?
Ongoing work with Thong Nguyen, IRLab, UvA.
Neural IR: the good, the bad, and the ugly
• The good
• The bad
• The ugly
• General similarity: prone to capturing bias and prone to
using shortcuts (which do not generalize)
• Little control over what knowledge is stored inside the neural model
43
44
Language Models as Knowledge Bases? Petroni, Rocktaeschel, Lewis, Bakhtin, Wu, Miller, Riedel. EMNLP 2019.
Language Models As or For Knowledge Bases. Razniewski, Yates, Kassner, Weikum. DL4KG 2021.
What’s inside a large language model?
Probe a LLM’s word predictions to see how it behaves
➔ Alan Turing was born in [MASK]
Advanced LLMs are often correct, but fundamental issues remain
• Epistemic: Probabilities ≠ knowledge
• Birthplace(Biden) – Scranton (83%), New Castle (0.2%)
• Birth country (Bengio) – France (3.5%), Canada (3.2%)
• Fallbacks to correlations
• Are all Greeks born in Athens?
• Are all Panagiotis’ Greek?
• Unaware of limits
• Microsoft was acquired by … Novell (84%)
• Sun Microsystems was acquired by … Oracle (84%)
Conclusion
• The good: neural methods have significantly improved ranking quality
• The bad: so far, no fast & effective "off the shelf" approach for general use
• The ugly: is the bad caused by fundamental limitations of this paradigm?
45
Podcast on Neural IR
Available: Spotify, iTunes, …
https://arxiv.org/abs/2010.06467
https://bit.ly/tr4tr-book
Conclusion
• The good: neural methods have significantly improved ranking quality
• The bad: so far, no fast & effective "off the shelf" approach for general use
• The ugly: is the bad caused by fundamental limitations of this paradigm?
46
Podcast on Neural IR
Available: Spotify, iTunes, …
https://arxiv.org/abs/2010.06467
https://bit.ly/tr4tr-book
Thanks!

Contenu connexe

Similaire à Improving search with neural ranking methods

Deep Content Learning in Traffic Prediction and Text Classification
Deep Content Learning in Traffic Prediction and Text ClassificationDeep Content Learning in Traffic Prediction and Text Classification
Deep Content Learning in Traffic Prediction and Text Classification
HPCC Systems
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
Akshay Hegde
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
Roelof Pieters
 
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
National Information Standards Organization (NISO)
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Databricks
 
asdrfasdfasdf
asdrfasdfasdfasdrfasdfasdf
asdrfasdfasdf
SwayattaDaw1
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Data Science London
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Claudio Greco
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Alessandro Suglia
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
MENGSAYLOEM1
 
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...
Andrew Gardner
 
How to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data TeamHow to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data Team
Traveloka
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Mining
ebelani
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
DataTactics
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
Rich Heimann
 
Week1- Introduction.pptx
Week1- Introduction.pptxWeek1- Introduction.pptx
Week1- Introduction.pptx
fahmi324663
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
Kira
 
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
National Institute of Informatics
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Template
butest
 
A Comparative Study On Featuree Selection In Text2
A Comparative Study On Featuree Selection In Text2A Comparative Study On Featuree Selection In Text2
A Comparative Study On Featuree Selection In Text2
Trector Rancor
 

Similaire à Improving search with neural ranking methods (20)

Deep Content Learning in Traffic Prediction and Text Classification
Deep Content Learning in Traffic Prediction and Text ClassificationDeep Content Learning in Traffic Prediction and Text Classification
Deep Content Learning in Traffic Prediction and Text Classification
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
 
asdrfasdfasdf
asdrfasdfasdfasdrfasdfasdf
asdrfasdfasdf
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...
 
How to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data TeamHow to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data Team
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Mining
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
 
Week1- Introduction.pptx
Week1- Introduction.pptxWeek1- Introduction.pptx
Week1- Introduction.pptx
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Template
 
A Comparative Study On Featuree Selection In Text2
A Comparative Study On Featuree Selection In Text2A Comparative Study On Featuree Selection In Text2
A Comparative Study On Featuree Selection In Text2
 

Plus de voginip

Zo wordt je factchecker - Aafko Boonstra
Zo wordt je factchecker - Aafko BoonstraZo wordt je factchecker - Aafko Boonstra
Zo wordt je factchecker - Aafko Boonstra
voginip
 
Automatisch metadateren - de kansen en de uitdagingen
Automatisch metadateren - de kansen en de uitdagingenAutomatisch metadateren - de kansen en de uitdagingen
Automatisch metadateren - de kansen en de uitdagingen
voginip
 
Hybride Intelligentie: de rol van Large Language Models in informatieverwerking
Hybride Intelligentie: de rol van Large Language Models in informatieverwerkingHybride Intelligentie: de rol van Large Language Models in informatieverwerking
Hybride Intelligentie: de rol van Large Language Models in informatieverwerking
voginip
 
Solving World War II Photo Mysteries with Open Source Techniques
Solving World War II Photo Mysteries with Open Source TechniquesSolving World War II Photo Mysteries with Open Source Techniques
Solving World War II Photo Mysteries with Open Source Techniques
voginip
 
PiCo: Historische personen beter vindbaar maken
PiCo: Historische personen beter vindbaar makenPiCo: Historische personen beter vindbaar maken
PiCo: Historische personen beter vindbaar maken
voginip
 
Red het internet! Op weg naar de online publieke ruimte
Red het internet! Op weg naar de online publieke ruimteRed het internet! Op weg naar de online publieke ruimte
Red het internet! Op weg naar de online publieke ruimte
voginip
 
AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)
AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)
AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)
voginip
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
voginip
 
The Dark Side of Science: Misconduct in Biomedical Research
The Dark Side of Science: Misconduct in Biomedical ResearchThe Dark Side of Science: Misconduct in Biomedical Research
The Dark Side of Science: Misconduct in Biomedical Research
voginip
 
Oude boeken, nieuwe vaardigheden en Wikipedia
Oude boeken, nieuwe vaardigheden en WikipediaOude boeken, nieuwe vaardigheden en Wikipedia
Oude boeken, nieuwe vaardigheden en Wikipedia
voginip
 
De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...
De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...
De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...
voginip
 
Open yet everywhere in chains: Where next for open knowledge?
Open yet everywhere in chains: Where next for open knowledge?Open yet everywhere in chains: Where next for open knowledge?
Open yet everywhere in chains: Where next for open knowledge?
voginip
 
The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...
voginip
 
Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...
Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...
Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...
voginip
 
Why one-size-fits all does not work in Explainable Artificial Intelligence!
Why one-size-fits all does not work in Explainable Artificial Intelligence!Why one-size-fits all does not work in Explainable Artificial Intelligence!
Why one-size-fits all does not work in Explainable Artificial Intelligence!
voginip
 
Systematisch zoeken op het web
Systematisch zoeken op het webSystematisch zoeken op het web
Systematisch zoeken op het web
voginip
 
Grote hoeveelheden tekst analyseren als data
Grote hoeveelheden tekst analyseren als dataGrote hoeveelheden tekst analyseren als data
Grote hoeveelheden tekst analyseren als data
voginip
 
Werken met Wikidata
Werken met WikidataWerken met Wikidata
Werken met Wikidata
voginip
 
Een gereedschapskist voor digitale vaardigheden
Een gereedschapskist voor digitale vaardighedenEen gereedschapskist voor digitale vaardigheden
Een gereedschapskist voor digitale vaardigheden
voginip
 
Een startende éénpitter in informatieland: wat goed ging en wat niet
Een startende éénpitter in informatieland: wat goed ging en wat nietEen startende éénpitter in informatieland: wat goed ging en wat niet
Een startende éénpitter in informatieland: wat goed ging en wat niet
voginip
 

Plus de voginip (20)

Zo wordt je factchecker - Aafko Boonstra
Zo wordt je factchecker - Aafko BoonstraZo wordt je factchecker - Aafko Boonstra
Zo wordt je factchecker - Aafko Boonstra
 
Automatisch metadateren - de kansen en de uitdagingen
Automatisch metadateren - de kansen en de uitdagingenAutomatisch metadateren - de kansen en de uitdagingen
Automatisch metadateren - de kansen en de uitdagingen
 
Hybride Intelligentie: de rol van Large Language Models in informatieverwerking
Hybride Intelligentie: de rol van Large Language Models in informatieverwerkingHybride Intelligentie: de rol van Large Language Models in informatieverwerking
Hybride Intelligentie: de rol van Large Language Models in informatieverwerking
 
Solving World War II Photo Mysteries with Open Source Techniques
Solving World War II Photo Mysteries with Open Source TechniquesSolving World War II Photo Mysteries with Open Source Techniques
Solving World War II Photo Mysteries with Open Source Techniques
 
PiCo: Historische personen beter vindbaar maken
PiCo: Historische personen beter vindbaar makenPiCo: Historische personen beter vindbaar maken
PiCo: Historische personen beter vindbaar maken
 
Red het internet! Op weg naar de online publieke ruimte
Red het internet! Op weg naar de online publieke ruimteRed het internet! Op weg naar de online publieke ruimte
Red het internet! Op weg naar de online publieke ruimte
 
AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)
AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)
AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
The Dark Side of Science: Misconduct in Biomedical Research
The Dark Side of Science: Misconduct in Biomedical ResearchThe Dark Side of Science: Misconduct in Biomedical Research
The Dark Side of Science: Misconduct in Biomedical Research
 
Oude boeken, nieuwe vaardigheden en Wikipedia
Oude boeken, nieuwe vaardigheden en WikipediaOude boeken, nieuwe vaardigheden en Wikipedia
Oude boeken, nieuwe vaardigheden en Wikipedia
 
De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...
De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...
De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...
 
Open yet everywhere in chains: Where next for open knowledge?
Open yet everywhere in chains: Where next for open knowledge?Open yet everywhere in chains: Where next for open knowledge?
Open yet everywhere in chains: Where next for open knowledge?
 
The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...
 
Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...
Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...
Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...
 
Why one-size-fits all does not work in Explainable Artificial Intelligence!
Why one-size-fits all does not work in Explainable Artificial Intelligence!Why one-size-fits all does not work in Explainable Artificial Intelligence!
Why one-size-fits all does not work in Explainable Artificial Intelligence!
 
Systematisch zoeken op het web
Systematisch zoeken op het webSystematisch zoeken op het web
Systematisch zoeken op het web
 
Grote hoeveelheden tekst analyseren als data
Grote hoeveelheden tekst analyseren als dataGrote hoeveelheden tekst analyseren als data
Grote hoeveelheden tekst analyseren als data
 
Werken met Wikidata
Werken met WikidataWerken met Wikidata
Werken met Wikidata
 
Een gereedschapskist voor digitale vaardigheden
Een gereedschapskist voor digitale vaardighedenEen gereedschapskist voor digitale vaardigheden
Een gereedschapskist voor digitale vaardigheden
 
Een startende éénpitter in informatieland: wat goed ging en wat niet
Een startende éénpitter in informatieland: wat goed ging en wat nietEen startende éénpitter in informatieland: wat goed ging en wat niet
Een startende éénpitter in informatieland: wat goed ging en wat niet
 

Dernier

Trump fist pump t shirts Trump fist pump t shirts
Trump fist pump t shirts Trump fist pump t shirtsTrump fist pump t shirts Trump fist pump t shirts
Trump fist pump t shirts Trump fist pump t shirts
exgf28
 
Team Cymru Community Services,Overview of all public services
Team Cymru Community Services,Overview of all public servicesTeam Cymru Community Services,Overview of all public services
Team Cymru Community Services,Overview of all public services
Bangladesh Network Operators Group
 
Saint Louis University diploma
Saint Louis University diplomaSaint Louis University diploma
Saint Louis University diploma
eufdev
 
Portugal Dreamin 24 - How to easily use an API with Flows
Portugal Dreamin 24  - How to easily use an API with FlowsPortugal Dreamin 24  - How to easily use an API with Flows
Portugal Dreamin 24 - How to easily use an API with Flows
Thierry TROUIN ☁
 
Open Source TCP or Netflow Log Server Using Graylog
Open Source TCP or Netflow Log Server Using GraylogOpen Source TCP or Netflow Log Server Using Graylog
Open Source TCP or Netflow Log Server Using Graylog
Bangladesh Network Operators Group
 
New York Institute of Technology degree Cert diploma offer
New York Institute of Technology degree Cert diploma offerNew York Institute of Technology degree Cert diploma offer
New York Institute of Technology degree Cert diploma offer
ubovu
 
upgrade to zabbix-7 0 como atualiza lts1
upgrade to zabbix-7 0 como atualiza lts1upgrade to zabbix-7 0 como atualiza lts1
upgrade to zabbix-7 0 como atualiza lts1
diogolsew
 
University of California, Riverside diploma
University of California, Riverside diplomaUniversity of California, Riverside diploma
University of California, Riverside diploma
eufdev
 
Best CSS Animation Libraries for Web Developers
Best CSS Animation Libraries for Web DevelopersBest CSS Animation Libraries for Web Developers
Best CSS Animation Libraries for Web Developers
Shrestha Raaz
 
Geolocation and Geofeed Implementation bdNOG18
Geolocation and Geofeed Implementation bdNOG18Geolocation and Geofeed Implementation bdNOG18
Geolocation and Geofeed Implementation bdNOG18
Bangladesh Network Operators Group
 
How God led me to DTS? Through many different signs and connections that I c...
How God led me to DTS? Through many different signs and connections that  I c...How God led me to DTS? Through many different signs and connections that  I c...
How God led me to DTS? Through many different signs and connections that I c...
AshishMohan57
 
202254.com香蕉影视,沙丘2在线播放,沙丘2线上看,最新电影沙丘2在线,热门电影推荐,2024最新科幻片推荐。
202254.com香蕉影视,沙丘2在线播放,沙丘2线上看,最新电影沙丘2在线,热门电影推荐,2024最新科幻片推荐。202254.com香蕉影视,沙丘2在线播放,沙丘2线上看,最新电影沙丘2在线,热门电影推荐,2024最新科幻片推荐。
202254.com香蕉影视,沙丘2在线播放,沙丘2线上看,最新电影沙丘2在线,热门电影推荐,2024最新科幻片推荐。
yilin01100
 
Top 50 Telephone Conversation Sample Examples For IT Industries.pdf
Top 50 Telephone Conversation Sample Examples For IT Industries.pdfTop 50 Telephone Conversation Sample Examples For IT Industries.pdf
Top 50 Telephone Conversation Sample Examples For IT Industries.pdf
Krishna L
 
Study of international anticancer research trends.pdf
Study of international anticancer research trends.pdfStudy of international anticancer research trends.pdf
Study of international anticancer research trends.pdf
Preston University
 
Free PMP Practice Exam Questions - 120 Sample Test Questions_ Answer.pdf
Free PMP Practice Exam Questions - 120 Sample Test Questions_ Answer.pdfFree PMP Practice Exam Questions - 120 Sample Test Questions_ Answer.pdf
Free PMP Practice Exam Questions - 120 Sample Test Questions_ Answer.pdf
LinhHershey
 
Enhancing seamless access using TIGERfed
Enhancing seamless access using TIGERfedEnhancing seamless access using TIGERfed
Enhancing seamless access using TIGERfed
Bangladesh Network Operators Group
 
Rent remote desktop server mangohost .net
Rent remote desktop server mangohost .netRent remote desktop server mangohost .net
Rent remote desktop server mangohost .net
pdfsubmission50
 
Mobile SEO India | Mobile SEO Service | Mobile SEO Company
Mobile SEO India | Mobile SEO Service | Mobile SEO CompanyMobile SEO India | Mobile SEO Service | Mobile SEO Company
Mobile SEO India | Mobile SEO Service | Mobile SEO Company
SIB Infotech
 
Do it again anti Republican shirt Do it again anti Republican shirt
Do it again anti Republican shirt Do it again anti Republican shirtDo it again anti Republican shirt Do it again anti Republican shirt
Do it again anti Republican shirt Do it again anti Republican shirt
exgf28
 
Digital ethnography of the Polish darknet drug trade community
Digital ethnography of the Polish darknet drug trade communityDigital ethnography of the Polish darknet drug trade community
Digital ethnography of the Polish darknet drug trade community
Piotr Siuda
 

Dernier (20)

Trump fist pump t shirts Trump fist pump t shirts
Trump fist pump t shirts Trump fist pump t shirtsTrump fist pump t shirts Trump fist pump t shirts
Trump fist pump t shirts Trump fist pump t shirts
 
Team Cymru Community Services,Overview of all public services
Team Cymru Community Services,Overview of all public servicesTeam Cymru Community Services,Overview of all public services
Team Cymru Community Services,Overview of all public services
 
Saint Louis University diploma
Saint Louis University diplomaSaint Louis University diploma
Saint Louis University diploma
 
Portugal Dreamin 24 - How to easily use an API with Flows
Portugal Dreamin 24  - How to easily use an API with FlowsPortugal Dreamin 24  - How to easily use an API with Flows
Portugal Dreamin 24 - How to easily use an API with Flows
 
Open Source TCP or Netflow Log Server Using Graylog
Open Source TCP or Netflow Log Server Using GraylogOpen Source TCP or Netflow Log Server Using Graylog
Open Source TCP or Netflow Log Server Using Graylog
 
New York Institute of Technology degree Cert diploma offer
New York Institute of Technology degree Cert diploma offerNew York Institute of Technology degree Cert diploma offer
New York Institute of Technology degree Cert diploma offer
 
upgrade to zabbix-7 0 como atualiza lts1
upgrade to zabbix-7 0 como atualiza lts1upgrade to zabbix-7 0 como atualiza lts1
upgrade to zabbix-7 0 como atualiza lts1
 
University of California, Riverside diploma
University of California, Riverside diplomaUniversity of California, Riverside diploma
University of California, Riverside diploma
 
Best CSS Animation Libraries for Web Developers
Best CSS Animation Libraries for Web DevelopersBest CSS Animation Libraries for Web Developers
Best CSS Animation Libraries for Web Developers
 
Geolocation and Geofeed Implementation bdNOG18
Geolocation and Geofeed Implementation bdNOG18Geolocation and Geofeed Implementation bdNOG18
Geolocation and Geofeed Implementation bdNOG18
 
How God led me to DTS? Through many different signs and connections that I c...
How God led me to DTS? Through many different signs and connections that  I c...How God led me to DTS? Through many different signs and connections that  I c...
How God led me to DTS? Through many different signs and connections that I c...
 
202254.com香蕉影视,沙丘2在线播放,沙丘2线上看,最新电影沙丘2在线,热门电影推荐,2024最新科幻片推荐。
202254.com香蕉影视,沙丘2在线播放,沙丘2线上看,最新电影沙丘2在线,热门电影推荐,2024最新科幻片推荐。202254.com香蕉影视,沙丘2在线播放,沙丘2线上看,最新电影沙丘2在线,热门电影推荐,2024最新科幻片推荐。
202254.com香蕉影视,沙丘2在线播放,沙丘2线上看,最新电影沙丘2在线,热门电影推荐,2024最新科幻片推荐。
 
Top 50 Telephone Conversation Sample Examples For IT Industries.pdf
Top 50 Telephone Conversation Sample Examples For IT Industries.pdfTop 50 Telephone Conversation Sample Examples For IT Industries.pdf
Top 50 Telephone Conversation Sample Examples For IT Industries.pdf
 
Study of international anticancer research trends.pdf
Study of international anticancer research trends.pdfStudy of international anticancer research trends.pdf
Study of international anticancer research trends.pdf
 
Free PMP Practice Exam Questions - 120 Sample Test Questions_ Answer.pdf
Free PMP Practice Exam Questions - 120 Sample Test Questions_ Answer.pdfFree PMP Practice Exam Questions - 120 Sample Test Questions_ Answer.pdf
Free PMP Practice Exam Questions - 120 Sample Test Questions_ Answer.pdf
 
Enhancing seamless access using TIGERfed
Enhancing seamless access using TIGERfedEnhancing seamless access using TIGERfed
Enhancing seamless access using TIGERfed
 
Rent remote desktop server mangohost .net
Rent remote desktop server mangohost .netRent remote desktop server mangohost .net
Rent remote desktop server mangohost .net
 
Mobile SEO India | Mobile SEO Service | Mobile SEO Company
Mobile SEO India | Mobile SEO Service | Mobile SEO CompanyMobile SEO India | Mobile SEO Service | Mobile SEO Company
Mobile SEO India | Mobile SEO Service | Mobile SEO Company
 
Do it again anti Republican shirt Do it again anti Republican shirt
Do it again anti Republican shirt Do it again anti Republican shirtDo it again anti Republican shirt Do it again anti Republican shirt
Do it again anti Republican shirt Do it again anti Republican shirt
 
Digital ethnography of the Polish darknet drug trade community
Digital ethnography of the Polish darknet drug trade communityDigital ethnography of the Polish darknet drug trade community
Digital ethnography of the Polish darknet drug trade community
 

Improving search with neural ranking methods

  • 1. Improving search with neural ranking methods Andrew Yates Assistant Professor, UvA @andrewyates
  • 2. About me Assistant Professor at University of Amsterdam (2021-) • Information retrieval, neural networks, natural language processing, (consumer) medical domain, personal knowledge graphs, … Previously • Senior Researcher (2018-2021) at Max Planck Institute for Informatics; Postdoc at MPII before that (2016-2018) • PhD in Computer Science from Georgetown University, Washington, DC • BSc in Computer Science from Illinois Institute of Technology, Chicago 2
  • 3. 3 Book on Neural IR (link on final slide) Podcast on Neural IR Available: Spotify, iTunes, …
  • 4. 4 Students and collaborators: Siddhant Arora, Arman Cohan, Jeffrey Dalton, Doug Downey, Sergey Feldman, Nazli Goharian, Kevin Martin Jose, Jimmy Lin, Sean MacAvaney, Thong Nguyen, Wei Yang, Xinyu Zhang, … Capreolus: A Toolkit for End-to-End Neural Ad Hoc Retrieval
  • 5. Outline • About me • What is NIR? • The good, the bad, and the ugly 5
  • 6. Neural? IR? Focus on ad hoc retrieval, better known as “googling something” ➔ We all do this a lot, and it works incredibly well ➔ Given an arbitrary query, return ranked list of relevant “documents” 6 query black bear attacks ... collection 1. 2. 3. metric: 0.66 ranking method
  • 7. Retrieval 101 • Crawl & index documents (text) • Query time: look for exact matches in documents, optionally combining with other signals (e.g., spam score) 7
  • 8. Retrieval 101 • Consider query terms that appear in document • Weight them by discriminativeness (inverse document frequency) and importance (term frequency) 8 IDF component TF component term saturation length normalization Consider only terms in both query & document
  • 9. Retrieval 101 9 • Consider query terms that appear in document • Weight them by discriminativeness (inverse document frequency) and importance (term frequency) General formula
  • 10. Neural retrieval? Neural refers to a type of machine learning method, which bring two key improvements 10
  • 11. Improvement #1: soft matching of terms Exact matching of Q and D terms Non-neural approaches: translation models, topic models like pLSI, ... 11
  • 12. Inverse Document Frequency of query term (collection stat) Term Frequency of q in document D (document stat) Improvement #2: less handcrafting (learn how to score) 12
  • 13. Inverse Document Frequency of query term Term Frequency of q in document D N = # docs and ni = # docs with term qi IDF defined as: • log N/ni • log (1 + N/ni ) • log (N- ni )/ni • … 13
  • 14. Inverse Document Frequency of query term Term Frequency of q in document D TF defined as: • TF(qi , D) • 1 + log TF(qi , D) • 0.5 + 0.5 [TF(qi , D) / max TF(D)] • … 14
  • 15. Soft matching Key point: neural methods replace exact matching with semantic matching With traditional methods, soft matching is possible (but it’s less effective) ● Stemming handles matches like singular vs. plural and different tenses swam, swimming, swim, swims replaced with swim ● Query expansion approaches handle context by adding related terms to query Query contains bank, referring to turning a plane Add context terms to query: flight, airplane, ailerons, spoilers, ... 15
  • 16. Soft matching via embeddings 16 Source: https://medium.com/@hari4om/word-embedding-d816f643140 Neural approaches represent words as learned vectors
  • 17. Soft matching via embeddings 17 Source: https://medium.com/@hari4om/word-embedding-d816f643140 Vector “embedding” allows comparisons of different terms
  • 18. Soft matching via embeddings 18 Source: https://towardsdatascience.com/deep-learning-4-embedding-layers-f9a02d55ac12 Embeddings capture a general notion of semantic similarity
  • 19. … … … … … … … … … [SEP] EA P8 E[CLS] T[CLS] [CLS] E1 T1 A1 E2 T2 A2 E3 T3 A3 E4 T4 [SEP] E5 T5 B1 E6 T6 B2 E7 T7 E[SEP] T[SEP] [CLS] tThe tbank tmakes tloans t[MASK] tclients EA EA EA EA EA EA EA P0 P1 P2 P3 P4 P5 P6 + + + + + + + + + + + + + + + + Token Embeddings Segment Embeddings Position Embeddings t. EA P7 + + The bank makes loans [MASK] clients . Loss = -log (P("to" | masked input)) Random masking 19 D ྾ V softmax Embeddings can be learned automatically Randomly- initialized Large Language Model (>100 million weights) Devlin, Chang, Lee, Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL 2019. “BERT”
  • 20. … … … … … … … … … [SEP] EA P8 E[CLS] T[CLS] [CLS] E1 T1 A1 E2 T2 A2 E3 T3 A3 E4 T4 [SEP] E5 T5 B1 E6 T6 B2 E7 T7 E[SEP] T[SEP] [CLS] tThe tbank tmakes tloans t[MASK] tclients EA EA EA EA EA EA EA P0 P1 P2 P3 P4 P5 P6 + + + + + + + + + + + + + + + + Token Embeddings Segment Embeddings Position Embeddings t. EA P7 + + The bank makes loans [MASK] clients . Loss = -log (P("to" | masked input)) Random masking 20 D ྾ V softmax Embeddings can be learned automatically Advantages ● Automatically learn semantic representations of words! ● These embeddings capture general similarity Disadvantages ● These embeddings capture general similarity Randomly- initialized Large Language Model (>100 million weights)
  • 21. Neural IR: the good, the bad, and the ugly • The good • Big improvements in search quality • Key ingredient: putting words in context • Nice property: zero-shot ranking works • The bad • The ugly 21
  • 22. source We’re making a significant improvement to how we understand queries, representing the biggest leap forward in the past five years, and one of the biggest leaps forward in the history of Search. Starting from April of this year (2019), we used large transformer models to deliver the largest quality improvements to our Bing customers in the past year. MS Bing Google Search source 22 Search quality improvements: industry
  • 23. Source: Yang et al., SIGIR 2019 Yilmaz et al., 2019; MacAvaney et al. 2019 Li et al., 2020; Nogueira et al., 2020 23 Robust04 Dataset (news articles) Search quality improvements: academia
  • 24. ~8 points! Search quality improvements: academia 24 MS MARCO Passage Dataset (Web pages) Source: MS MARCO Leaderboard 19 point improvement: BM25 vs. best neural 8 point improvement: neural pre-LLM vs. LLM
  • 25. Neural IR: the good, the bad, and the ugly • The good • Big improvements in search quality • Key ingredient: putting words in context • Nice property: zero-shot ranking works • The bad • The ugly 25
  • 26. Key ingredient: contextualization Modern approaches use a LLM (BERT) to create contextualized embeddings 26 Relevant Document Nonrelevant MacAvaney, Yates, Cohan, Goharian. CEDR: Contextualized Embeddings for Document Ranking. SIGIR 2019.
  • 27. Key ingredient: contextualization Score can be produced by placing document terms in similarity buckets, then computing score based on the size of each bucket 27 sim = 1.0 0.6 < sim < 1.0 sim <= 0.6 Tesla 4 terms 3 terms 20 terms net 1 term 8 terms 40 terms worth 2 terms 2 terms 15 terms MacAvaney, Yates, Cohan, Goharian. CEDR: Contextualized Embeddings for Document Ranking. SIGIR 2019. Guo, Fan, Ai, Croft. A Deep Relevance Matching Model for Ad-hoc Retrieval. CIKM 2016. Query Terms Similarity Buckets Score(Tesla) = 4*a + 3*b + 20*c Total score = Score(Tesla) + Score(net) + Score(worth)
  • 28. Neural IR: the good, the bad, and the ugly • The good • Big improvements in search quality • Key ingredient: putting words in context • Nice property: zero-shot ranking works • The bad • The ugly 28
  • 29. Zero-shot ranking Zero-shot: “this is my first attempt at a task” ➔ We typically think of search as zero-shot If Apple releases iDevice tomorrow, can immediately search for it ➔ In experiments, operationalized as “this is my first attempt at this dataset” 29 A Little Bit Is Worse Than None: Ranking with Limited Training Data. Zhang, Yates, Lin. SustaiNLP 2020.
  • 30. 30 Zero-shot ranking A Little Bit Is Worse Than None: Ranking with Limited Training Data. Zhang, Yates, Lin. SustaiNLP 2020. No improvement training with up to 50% of in-domain data Neural method (zero-shot) Traditional methods
  • 31. Neural IR: the good, the bad, and the ugly • The good • The bad • Which approach is best? Depends on what we measure • Zero-shot effectiveness varies a lot • The ugly 31
  • 32. Many categories of queries are ‘solved’ by current systems 121171: define etruscans 1113256: what is reba mcentire's net worth 701453: what is a statutory deed Basic definition Simple factoid QA Easy description What should we measure? How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset. Mackie, Dalton, Yates. SIGIR 2021.
  • 33. But others are more difficult 1037798: who is robert gray 10486: cost of interior concrete flooring Ambiguous Ambiguous and localized 144862: did prohibition increased crime Complex reasoning 507445: symptoms of different types of brain bleeds Concerns multiple entities What should we measure? How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset. Mackie, Dalton, Yates. SIGIR 2021.
  • 34. 144862: did prohibition increased crime Complex reasoning 507445: symptoms of different types of brain bleeds Concerns multiple entities Hard + Interesting How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset. Mackie, Dalton, Yates. SIGIR 2021.
  • 35. Criteria • Non-Factoid: Not answerable by a single short answer • Beyond single passage: More than a simple definition • Answerable: Answerable solely from the provided context • Text-focused: Non-text answers or calculations should be excluded • Mostly well-formed: Not contain clear spelling or language errors • Possibly Complex: Multiple entities, comparison, complex reasoning, or multiple answers Entity Linking Query Annotations Web Search Engine Annotations TREC Deep Learning Collection Queries Runs What makes a query difficult & interesting?
  • 36. 41% Factoid 10% List 18% Long Answer 24% Web Search 8% Factoid 31% List 27% Long Answer 35% Web Search TREC Deep Learning Collection DL-HARD Collection The query type matters How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset. Mackie, Dalton, Yates. SIGIR 2021.
  • 37. 41% Factoid 10% List 18% Long Answer 24% Web Search 8% Factoid 31% List 27% Long Answer 35% Web Search TREC Deep Learning Collection DL-HARD Collection The query type matters How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset. Mackie, Dalton, Yates. SIGIR 2021. Swaps in system rankings ● rank 1 → rank 4 ● rank 13 → rank 5 ● bottom → bottom + 1 ● etc.
  • 38. Neural IR: the good, the bad, and the ugly • The good • The bad • Which approach is best? Depends on what we measure • Zero-shot effectiveness varies a lot • The ugly 38
  • 39. Zero-shot: reranking > dense retrieval 39 Reranking approaches compare individual query and document terms
  • 40. 40 s Zero-shot: reranking > dense retrieval Instead of comparing terms, entire query/doc represented by single vector
  • 41. Neural IR: the good, the bad, and the ugly • The good • The bad • The ugly • General similarity: prone to capturing bias and prone to using shortcuts (which do not generalize) • Little control over what knowledge is stored inside the neural model 41
  • 42. 42 Term gender doctor who women dr medical her pregnancy … birth baby number Weight 112 96 71 67 60 55 52 43 … 28 24 7 What’s inside a dense representation? query Tesla’s net worth 0.0 0.2 0.3 2.5 1.7 0.8 query vector Converter term weights tesla worth net price car … … … 32 23 20 18 11 Query: what is the gender of that doctor? Ongoing work with Thong Nguyen, IRLab, UvA.
  • 43. Neural IR: the good, the bad, and the ugly • The good • The bad • The ugly • General similarity: prone to capturing bias and prone to using shortcuts (which do not generalize) • Little control over what knowledge is stored inside the neural model 43
  • 44. 44 Language Models as Knowledge Bases? Petroni, Rocktaeschel, Lewis, Bakhtin, Wu, Miller, Riedel. EMNLP 2019. Language Models As or For Knowledge Bases. Razniewski, Yates, Kassner, Weikum. DL4KG 2021. What’s inside a large language model? Probe a LLM’s word predictions to see how it behaves ➔ Alan Turing was born in [MASK] Advanced LLMs are often correct, but fundamental issues remain • Epistemic: Probabilities ≠ knowledge • Birthplace(Biden) – Scranton (83%), New Castle (0.2%) • Birth country (Bengio) – France (3.5%), Canada (3.2%) • Fallbacks to correlations • Are all Greeks born in Athens? • Are all Panagiotis’ Greek? • Unaware of limits • Microsoft was acquired by … Novell (84%) • Sun Microsystems was acquired by … Oracle (84%)
  • 45. Conclusion • The good: neural methods have significantly improved ranking quality • The bad: so far, no fast & effective "off the shelf" approach for general use • The ugly: is the bad caused by fundamental limitations of this paradigm? 45 Podcast on Neural IR Available: Spotify, iTunes, … https://arxiv.org/abs/2010.06467 https://bit.ly/tr4tr-book
  • 46. Conclusion • The good: neural methods have significantly improved ranking quality • The bad: so far, no fast & effective "off the shelf" approach for general use • The ugly: is the bad caused by fundamental limitations of this paradigm? 46 Podcast on Neural IR Available: Spotify, iTunes, … https://arxiv.org/abs/2010.06467 https://bit.ly/tr4tr-book Thanks!