This paper investigates the role of Distributional
Semantic Models (DSMs) in Question Answering (QA), and
specifically in a QA system called QuestionCube. QuestionCube is
a framework for QA that combines several techniques to retrieve
passages containing the exact answers for natural language questions.
It exploits Information Retrieval models to seek candidate
answers and Natural Language Processing algorithms for the
analysis of questions and candidate answers both in English and
Italian. The data source for the answer is an unstructured text
document collection stored in search indices.
In this paper we propose to exploit DSMs in the QuestionCube
framework. In DSMs words are represented as mathematical
points in a geometric space, also known as semantic space. Words
are similar if they are close in that space. Our idea is that
DSMs approaches can help to compute relatedness between users’
questions and candidate answers by exploiting paradigmatic
relations between words. Results of an experimental evaluation
carried out on CLEF2010 QA dataset, prove the effectiveness of
the proposed approach.
Injustice - Developers Among Us (SciFiDevCon 2024)
Exploiting Distributional Semantic Models in Question Answering
1. Exploiting Distributional Semantic
Models in Question Answering
Piero Molino1,2, Pierpaolo Basile1,2,
Annalina Caputo1, Pasquale Lops1, Giovanni Semeraro1
{basilepp@di.uniba.it}
1Dept. of Computer Science, University of Bari Aldo Moro
2QuestionCube s.r.l.
6th IEEE International Conference on Semantic Computing
September 19-21, 2012 Palermo, Italy
2. Background
A bottle of Tesgüino is on the table.
Everyone likes Tesgüino.
Tesgüino makes you drunk.
We make Tesgüino out of corn.
3. Background
A bottle of Tesgüino is on the table.
Everyone likes Tesgüino.
Tesgüino makes you drunk.
We make Tesgüino out of corn.
It is a corn beer
4. Background
You shall know a word by
the company it keeps!
Meaning of a word is
determined by its usage
4
5. Background
Distributional Semantic Models (DSMs)
– represent words as points in a geometric space
– do not require specific text operations
– corpus/language independent
Widely used in IR and Computational Linguistic
– semantic text similarity, synonyms detection, query
expansion, …
Not directly used in Question Answering (QA)
5
6. Our idea
Using DSMs into a QA system
– QuestionCube: a framework for QA
Exploit DSMs
– compare semantic similarity between user’s question
and candidate answer
7. Our idea
Using DSMs into a QA system
– QuestionCube: a framework for QA
Exploit DSMs
– compare semantic similarity between user’s question
and candidate answer
Q: Which beverages contain alcohol?
A: Tesgüino makes you drunk.
8. Outline
• Introduction to the QA problem
• Distributional Semantic Models
– semantic composition
• DSMs into QuestionCube
• Evaluation and results
• Final remark
9. QA vs. IR
Question Answering Information Retrieval
Information need Natural language question Keywords
Factoid answer or short
Result Whole documents
passages
Searching manually inside
Information acquisition Reading answers
documents
11. Distributional Semantic Models…
Defined as: <T, C, R, W, M, d, S>
– T: target elements (words)
– C: context
– R: relation between T and C
– W: weighting schema
– M: geometric space TxC
– d: space reduction M -> M’
– S: similarity function defined in M’
12. …Distributional Semantic Models
Term-term co-occurrence matrix: each cell
contains the co-occurrences between two terms
within a prefixed distance
dog cat computer animal mouse
dog 0 4 0 2 1
cat 4 0 0 3 5
computer 0 0 0 0 3
animal 2 3 0 0 2
mouse 1 5 3 2 0
13. …Distributional Semantic Models
TTM: term-term co-occurrence matrix
Latent Semantic Analysis (LSA): relies on the
Singular Value Decomposition (SVD) of the co-
occurrence matrix
Random Indexing (RI): based on the Random
Projection
Latent Semantic Analysis over Random Indexing
(LSARI)
14. Latent Semantic Analysis
Given M the TTM matrix
– apply the SVD decomposition M=UΣVT
• Σ is the matrix of singular values
– choose k singular values from Σ and approximate
the matrix M
• M*=U*Σ*V*T
– SVD
• discover high-order relations between terms
• reduce the sparseness of the original matrix
15. Random Indexing
1. Generate and assign a context vector to each
context element (e.g. document, passage,
term, …)
2. Term vector is the sum of the context vectors
in which the term occurs
– sometimes the context vector could be boosted by
a score (e.g. term frequency, PMI, …)
15
16. Context Vector
0 0 0 0 0 0 0 -1 0 0 0 0 1 0 0 -1 0 1 0 0 0 0 1 0 0 0 0 -1
• sparse
• high dimensional
• ternary {-1, 0, +1}
• small number of randomly distributed non-
zero elements
16
17. Random Indexing (formal)
B =A R
n,k n,m m,k
k << m
B nearly preserves the
distance between points
(Johnson-Lindenstrauss lemma)
dr = c× d
RI is a locality-sensitive hashing method which approximate the cosine
distance between vectors
17
19. Latent Semantic Analysis over Random
Indexing
1. Reduce the dimension of the co-occurrences
matrix using RI
2. Perform LSA over RI (LSARI)
– reduction of LSA computation time: RI matrix
contains less dimensions than co-occurrences
matrix
21. QuestionCube framework…
Natural Language pipeline for Italian and English
– PoS-tagging, lemmatization, Name Entity Recognition,
Phrase Chunking, Dependency Parsing, Word Sense
Disambiguation (WSD)
IR model based on classical VSM or BM25
– several query expansion strategies
Passage retrieval
22. …QuestionCube framework
Question
Passage Filter
Question Term filter
analysis
Ranked list of
N-gram filter
passages
Question
expansion and Density filter
categorization
Candidate Sequence
passages filter
Document
retrieval
Score norm
Index
24. Integration
Question
Passage Filter
Question Term filter
analysis
Ranked list of
N-gram filter
passages
Question
expansion and Density filter
categorization
Candidate Sequence
passages filter
Distributional
Document filter
retrieval
New filter
Score norm based on DSMs
Index
25. Distributional filter…
Compute the similarity between the user’s
question and each candidate answer
Both user’s question and candidate answer are
represented by more than one term
– a method to compose words is necessary
27. …Distributional filter (compositional
approach)
Addition (+): pointwise sum of components
– represent complex structures by summing words
which compose them
Given q=q1 q2 … qn and a=a1 a2 … am
– question: q q1 q2 qn
– candidate answer: a a1 a2 am
q a
– similarity sim ( q , a )
q a
29. Evaluation…
Dataset: 2010 CLEF QA Competition
– 10,700 documents
• from European Union legislation and European Parliament
transcriptions
– 200 questions
– Italian and English
DSMs
– 1,000 vector dimension (TTM/LSA/RI/RILSA)
– 50,000 most frequent words
– co-occurrence distance: 4
29
30. ...Evaluation
1. Effectiveness of DSMs in QuestionCube
2. Comparison between the several DSMs
adopted
Metrics
– a@n: accuracy taking into account only the first n
answers
N
1
– MRR based on the rank
rank
of the correct answer i 1 i
N
30
31. Evaluation (alone scenario)
Only distributional filter Passage Filter
is adopted: no other Term filter
filters in the pipeline
N-gram filter
Density filter
Sequence
filter
Distributional
filter
Score norm
31
32. Evaluation (combined)
Distributional filter is Passage Filter
combined with the other Term filter
filters
– using CombSum function N-gram filter
– baseline: distributional Density filter
filter is removed
Sequence
filter
Distributional
filter
Score norm
32
35. Final remarks…
Alone: all the proposed DSMs are better than
the TTM
– in particular LSA and LSARI
Combined: all the combinations are able to
overcome the baseline
– English +16% (RI/LSA), Italian +26% (LSARI)
Slight difference in performance between LSA
and LSARI
36. …Final remarks
Distributional filter in the alone scenario shows
an higher improvement than in the combination
scenario
– find more effective way to combine filters
(learning to rank approach)
Exploit more complex semantic compositional
strategies
37. Thank you for your attention!
Questions?
<basilepp@di.uniba.it>