Exploiting Distributional Semantic Models in Question Answering

Exploiting Distributional Semantic
Models in Question Answering
Piero Molino1,2, Pierpaolo Basile1,2,
Annalina Caputo1, Pasquale Lops1, Giovanni Semeraro1
{basilepp@di.uniba.it}
1Dept. of Computer Science, University of Bari Aldo Moro
2QuestionCube s.r.l.

6th IEEE International Conference on Semantic Computing
September 19-21, 2012 Palermo, Italy

Background
A bottle of Tesgüino is on the table.
Everyone likes Tesgüino.
Tesgüino makes you drunk.
We make Tesgüino out of corn.

Background
A bottle of Tesgüino is on the table.
Everyone likes Tesgüino.
Tesgüino makes you drunk.
We make Tesgüino out of corn.

It is a corn beer

Background
You shall know a word by
the company it keeps!

Meaning of a word is
determined by its usage

4

Background
Distributional Semantic Models (DSMs)
– represent words as points in a geometric space
– do not require specific text operations
– corpus/language independent
Widely used in IR and Computational Linguistic
– semantic text similarity, synonyms detection, query
expansion, …
Not directly used in Question Answering (QA)

5

Our idea
Using DSMs into a QA system
– QuestionCube: a framework for QA
Exploit DSMs
– compare semantic similarity between user’s question
and candidate answer

Our idea
Using DSMs into a QA system
– QuestionCube: a framework for QA
Exploit DSMs
– compare semantic similarity between user’s question
and candidate answer

Q: Which beverages contain alcohol?
A: Tesgüino makes you drunk.

Outline
• Introduction to the QA problem
• Distributional Semantic Models
– semantic composition
• DSMs into QuestionCube
• Evaluation and results
• Final remark

QA vs. IR
Question Answering Information Retrieval

Information need Natural language question Keywords

Factoid answer or short
Result Whole documents
passages

Searching manually inside
Information acquisition Reading answers
documents

DISTRIBUTIONAL SEMANTIC
MODELS

Distributional Semantic Models…
Defined as: <T, C, R, W, M, d, S>
– T: target elements (words)
– C: context
– R: relation between T and C
– W: weighting schema
– M: geometric space TxC
– d: space reduction M -> M’
– S: similarity function defined in M’

…Distributional Semantic Models
Term-term co-occurrence matrix: each cell
contains the co-occurrences between two terms
within a prefixed distance
dog cat computer animal mouse
dog 0 4 0 2 1
cat 4 0 0 3 5
computer 0 0 0 0 3
animal 2 3 0 0 2
mouse 1 5 3 2 0

…Distributional Semantic Models
TTM: term-term co-occurrence matrix
Latent Semantic Analysis (LSA): relies on the
Singular Value Decomposition (SVD) of the co-
occurrence matrix
Random Indexing (RI): based on the Random
Projection
Latent Semantic Analysis over Random Indexing
(LSARI)

Latent Semantic Analysis
Given M the TTM matrix
– apply the SVD decomposition M=UΣVT
• Σ is the matrix of singular values
– choose k singular values from Σ and approximate
the matrix M
• M*=U*Σ*V*T
– SVD
• discover high-order relations between terms
• reduce the sparseness of the original matrix

Random Indexing
1. Generate and assign a context vector to each
context element (e.g. document, passage,
term, …)
2. Term vector is the sum of the context vectors
in which the term occurs
– sometimes the context vector could be boosted by
a score (e.g. term frequency, PMI, …)

15

Context Vector

0 0 0 0 0 0 0 -1 0 0 0 0 1 0 0 -1 0 1 0 0 0 0 1 0 0 0 0 -1

• sparse
• high dimensional
• ternary {-1, 0, +1}
• small number of randomly distributed non-
zero elements
16

Random Indexing (formal)

B =A R
n,k n,m m,k
k << m
B nearly preserves the
distance between points
(Johnson-Lindenstrauss lemma)
dr = c× d
RI is a locality-sensitive hashing method which approximate the cosine
distance between vectors
17

Random Indexing (example)

CVjohn -> (0, 0, 0, 0, 0, 0, 1, 0, -1, 0)
John eats a red apple CVeat -> (1, 0, 0, 0, -1, 0, 0 ,0 ,0 ,0)
CVred -> (0, 0, 0, 1, 0, 0, 0, -1, 0, 0)

TVapple= CVjohn+ CVeat+ CVred=(1, 0, 0, 1, -1, 0, 1, -1, -1, 0)

18

Latent Semantic Analysis over Random
Indexing
1. Reduce the dimension of the co-occurrences
matrix using RI
2. Perform LSA over RI (LSARI)
– reduction of LSA computation time: RI matrix
contains less dimensions than co-occurrences
matrix

QuestionCube framework…
Natural Language pipeline for Italian and English
– PoS-tagging, lemmatization, Name Entity Recognition,
Phrase Chunking, Dependency Parsing, Word Sense
Disambiguation (WSD)
IR model based on classical VSM or BM25
– several query expansion strategies
Passage retrieval

…QuestionCube framework
Question
Passage Filter

Question Term filter
analysis
Ranked list of
N-gram filter
passages
Question
expansion and Density filter
categorization
Candidate Sequence
passages filter

Document
retrieval
Score norm

Index

Integration
Question
Passage Filter

Question Term filter
analysis
Ranked list of
N-gram filter
passages
Question
expansion and Density filter
categorization
Candidate Sequence
passages filter
Distributional
Document filter
retrieval
New filter
Score norm based on DSMs

Index

Distributional filter…
Compute the similarity between the user’s
question and each candidate answer
Both user’s question and candidate answer are
represented by more than one term
– a method to compose words is necessary

…Distributional filter…
candidate Question candidate
answer1 answer3

candidate candidate
answer2 answer5

Distributional
candidate re-ranking candidate
filter
answer3 answer1
DSM

compositional
approach

candidate candidate
answern answern

…Distributional filter (compositional
approach)
Addition (+): pointwise sum of components
– represent complex structures by summing words
which compose them
Given q=q1 q2 … qn and a=a1 a2 … am
   
– question: q q1 q2  qn
   
– candidate answer: a a1 a2  am
 
q a
– similarity sim ( q , a )  
q a

Evaluation…
Dataset: 2010 CLEF QA Competition
– 10,700 documents
• from European Union legislation and European Parliament
transcriptions
– 200 questions
– Italian and English
DSMs
– 1,000 vector dimension (TTM/LSA/RI/RILSA)
– 50,000 most frequent words
– co-occurrence distance: 4

29

...Evaluation
1. Effectiveness of DSMs in QuestionCube
2. Comparison between the several DSMs
adopted
Metrics
– a@n: accuracy taking into account only the first n
answers
N
1
– MRR based on the rank
rank
of the correct answer i 1 i

N

30

Evaluation (alone scenario)
Only distributional filter Passage Filter

is adopted: no other Term filter
filters in the pipeline
N-gram filter

Density filter

Sequence
filter
Distributional
filter

Score norm

31

Evaluation (combined)
Distributional filter is Passage Filter

combined with the other Term filter
filters
– using CombSum function N-gram filter

– baseline: distributional Density filter
filter is removed
Sequence
filter
Distributional
filter

Score norm

32

Results (English)…
Run a@1 a@5 a@10 a@30 MRR
alone TTM .060 .145 .215 .345 .107
RI .180 .370 .425 .535 .267
LSA .205 .415 .490 .600 .300
LSARI .190 .405 .490 .620 .295
combined baseline* .445 .635 .690 .780 .549
TTM .535 .715 .775 .810 .6141
RI .550 .730 .785 .870 .6371,2
LSA .560 .725 .790 .855 .6371
LSARI .555 .730 .790 .870 .6341

1significat wrt. baseline
2significat wrt. TTM

*without distributional filter

…Results (Italian)
Run a@1 a@5 a@10 a@30 MRR
alone TTM .060 .140 .175 .280 .097
RI .175 .305 .385 .465 .241
LSA .155 .315 .390 .480 .229
LSARI .180 .335 .400 .500 .254
combined baseline* .365 .530 .630 .715 .441
TTM .405 .565 .645 .740 .5391
RI .465 .645 .720 .785 .5551
LSA .470 .645 .690 .785 .5511
LSARI .480 .635 .690 .785 .5571,2

1significat wrt. baseline
2significat wrt. TTM

*without distributional filter

Final remarks…
Alone: all the proposed DSMs are better than
the TTM
– in particular LSA and LSARI
Combined: all the combinations are able to
overcome the baseline
– English +16% (RI/LSA), Italian +26% (LSARI)
Slight difference in performance between LSA
and LSARI

…Final remarks
Distributional filter in the alone scenario shows
an higher improvement than in the combination
scenario
– find more effective way to combine filters
(learning to rank approach)
Exploit more complex semantic compositional
strategies

Thank you for your attention!

Questions?
<basilepp@di.uniba.it>

Exploiting Distributional Semantic Models in Question Answering

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Exploiting Distributional Semantic Models in Question Answering

Similaire à Exploiting Distributional Semantic Models in Question Answering (20)

Plus de Pierpaolo Basile

Plus de Pierpaolo Basile (20)

Dernier

Dernier (20)

Exploiting Distributional Semantic Models in Question Answering