Learning to rank method has been proposed for practical application in the field of information retrieval. When employing it in microblog retrieval, the significant interactions of the various involved features are rarely considered. In this paper, we propose a Ranking Factorization Machine (Ranking FM) model, which applies Factorization Machine model to microblog ranking on basis of pairwise classification. In this way, our proposed model combines the generality of learning to rank framework with the advantages of factorization models in estimating interactions between features, leading to better retrieval performance. Moreover, three groups of features (content relevance features, semantic expansion features and quality features) and their interactions are utilized in the Ranking FM model with the methods of stochastic gradient descent and adaptive regularization for optimization. Experimental results demonstrate its superiority over several baseline systems on a real Twitter dataset in terms of P@30 and MAP metrics. Furthermore, it outperforms the best performing results in the TREC'12 Real-Time Search Task.
Exploiting Ranking Factorization Machines for Microblog Retrieval
1. 北京大学计算机科学技术研究所
Institute of Computer Science & Technology Peking University
CIKM 2013
Exploiting Ranking Factorization
Machines for Microblog Retrieval
Runwei Qiang Feng Liang
Jianwu Yang
Institute of Computer Science and Technology
Peking University
1
Exploiting Ranking Factorization Machines for Microblog Retrieval
3. Motivations
IR for microblog is a non-trivial problem
Length of document is very short
severe vocabulary-mismatch problem, how to apply query
expansion technique?
Abundance of shortened URLs
offer ways to expand document, but how to make use of it?
Large quantities of pointless babble
3
How to use the tweet quality to filter non-informative message?
Exploiting Ranking Factorization Machines for Microblog Retrieval
4. Motivations
Learning to rank methods can make full use of different
models or factors in microblog retrieval
different factors => different features
Many features have been proved useful
4
Semantic features between query and document
Tweet quality features, i.e. link, retweet, and mention
count/binary
Exploiting Ranking Factorization Machines for Microblog Retrieval
5. Limitations
Features are considered independent
Some features are closely related to each other.
RT and @ symbols occur in the same tweet frequently.
Feature utilization
Link feature: binary => semantic information
Small plane crashes at big airport; no one notices- CNN.com
5
Exploiting Ranking Factorization Machines for Microblog Retrieval
6. Proposal
Employ an Ranking FM Framework
Adopts FM as the ranking function to model interactions
between features
Utilize several effective features which are neglected in
existing work
Optimize Ranking FM by two optimization methods
6
Stochastic Gradient Descent
Adaptive Regularization
Exploiting Ranking Factorization Machines for Microblog Retrieval
7. Outline
Ranking FM for Microblog Retrieval
Ranking FM Framework
Optimization Methods
Feature Description
Experiments
Summary
7
Exploiting Ranking Factorization Machines for Microblog Retrieval
8. Ranking FM Framework
Pairwise approach
x p , y p , xq , yq
1 y p
x p , xq , z
1 yq
yq
yp
Loss function
(
min L() lt f ; x (pt ) , xqt ) , z ( t ) 2
l
t 1
FM ranking
Hinge Loss
function Function
8
Regularization
term
Exploiting Ranking Factorization Machines for Microblog Retrieval
9. Factorization Machines Model
n
n
ˆ
y ( x) w0 wi xi
i 1
n
i 1 j i 1
k
vi , v j xi x j
factorized
parameters
vi , v j vi , f ·j , f
v
f 1
nested
interations
factorization dimensionality
2
n
1 k n
2
2
ˆ
y ( x) w0 wi xi vi , f xi vi , f xi
2 f 1 i 1
i 1
i 1
n
𝑂(𝑘 ∙ 𝑛)
9
Exploiting Ranking Factorization Machines for Microblog Retrieval
10. Learn Ranking FM
timeconsuming
Stochastic Gradient Descent
Grid search on validation set for find the best λ
Adaptive Regularization [2]
Training set
ˆ
(t 1) | (t ) : arg min l y (x | ( t ) ), y ( t ) 2
x , y ST
Validation Set
ˆ
l y (x | ( t 1) ), y ( t ) 2
x , y SV
(t 1) | (t 1) : arg min
adapt the
regularization
automatically
10
Exploiting Ranking Factorization Machines for Microblog Retrieval
11. Feature Description
Content Relevance Features (3)
Semantic Expansion Features (3x3=9)
Query & Tweet
BM25、TFIDF、Language Model Score
Query & topic info;
Expanded query & Tweet;
Expanded query & Topic info
BM25、TFIDF、Language Model Score
Quality Features (5)
11
mention、retweet、hashtag、link binary feature
tweet length
Exploiting Ranking Factorization Machines for Microblog Retrieval
12. Experimental Setup
Dataset
title field of link pages
TREC’11 50 queries
TREC’12 60 queries
Evaluation Metrics
Status
200
OK
302
Found
815,794
403
Forbidden
817,273
404
Not Found
868,667
Null
about 2 weeks twitter data
TopicInfo Corpus
HTTP Code
TREC Tweet11 Corpus
Summary statistics of Tweet11 Corpus
Null
67,011
Searchable
# of tweets
8,084,724
8,900,518
Summary statistics of TopicInfo Corpus
200
OK
302
Found
Forbidden
5,050
404
Not Found
92,378
Null
P@30 & MAP
Status
403
HTTP Code
Null
265,468
Searchable
12
# of tweets
1,225,947
688
1,226,635
Exploiting Ranking Factorization Machines for Microblog Retrieval
13. Baselines
KL2SFBLoc [3]
hitURLrun3 [4]
Expanded language model with two-stage query expansion
Perform very well in TREC’11 real time search task
Use a logistic regression model to learn a pairwise ranking for
microblog retrieval
Best Performing system in TREC’12 real time search task
RSVM_Full
13
Ranking SVM with linear kernel
Same feature set the Ranking FM used
Exploiting Ranking Factorization Machines for Microblog Retrieval
14. Ranking FM Performance
7% improve
on P@30
4% improve
on P@30
Metric
KL2SFBLoc
RSVM_Full
hitURLrun3
RFM_FullSGD
RFM_FullAR
P@30
0.2441
0.2616
0.2701
0.2808
0.2746
MAP
0.2506
0.2597
0.2642
0.2694
0.2678
TREC’12
Best
14
Ranking FM
Exploiting Ranking Factorization Machines for Microblog Retrieval
15. Feature Study
0.5
Full
-Quality
-Document Expansion
-Query Expansion
-Content Relevance
Only Content Relevance
0.45
0.4
P@N
0.35
0.3
0.25
0.2
0
5
10
15
N
20
25
30
Ranking FM of k=3 optimized by SGD
15
Exploiting Ranking Factorization Machines for Microblog Retrieval
16. Influence of the hyper-parameter k
0.29
0.275
RFM_FullSGD
RFM_FullSGD
0.285
0.27
0.265
0.275
MAP
P@30
0.28
0.27
0.255
0.265
0.25
0.26
0.255
0
0.26
5
10
15
0.245
0
k
5
10
15
k
Ranking FM optimized by SGD
16
Exploiting Ranking Factorization Machines for Microblog Retrieval
17. Stochastic gradient descent v.s.
Adaptive regularization
4
3
x 10
Training time (s)
2.5
Stochastic Gradient Descent
Adaptive Regularization
2
1.5
1
0.5
0
0
5
10
15
k
Method
P@10
P@30
MAP
RFM_FullSGD
0.4068
0.3695
0.2808
0.2694
RFM_FullAR
17
P@5
0.4034
0.3678
0.2746
0.2678
Exploiting Ranking Factorization Machines for Microblog Retrieval
18. Summary
Ranking FM Framework
Two optimization methods
Pairwise approach
Use Factorization Machines as ranking function
Stochastic Gradient Descent
Adaptive Regularization
Three groups of features
18
Content Relevance Features
Semantic Expansion Features
Quality Features
Exploiting Ranking Factorization Machines for Microblog Retrieval
19. References
[1] Iadh Ounis, Jimmy Lin, and Ian Soboroff. Overview of the TREC2011 MicroblogTrack. In Proceedings of TREC 2011, 2012.
[2] S. Rendle. Learning recommender systems with adaptive
regularization. In Proceedings of the fifth ACM international conference
on Web search and data mining, WSDM ’12, pages 133–142. ACM,
2012.
[3] F. Liang, R. Qiang, and J. Yang. Exploiting real-time information
retrieval in the microblogosphere. JCDL ’12, pages 267–276. ACM,
2012.
[4] Z. Han, X. Li, M. Yang, H. Qi, S. Li, and T. Zhao. Hit at TREC 2012
Microblog Track. In Proceedings of TREC 2012, 2013.
19
Exploiting Ranking Factorization Machines for Microblog Retrieval
20. 北京大学计算机科学技术研究所
Institute of Computer Science & Technology Peking University
CIKM 2013
Exploiting Ranking Factorization
Machines for Microblog Retrieval
Runwei Qiang Feng Liang
Jianwu Yang
Institute of Computer Science and Technology
Peking University
20
Exploiting Ranking Factorization Machines for Microblog Retrieval