Exploiting Ranking Factorization Machines for Microblog Retrieval

北京大学计算机科学技术研究所
Institute of Computer Science & Technology Peking University

CIKM 2013
Exploiting Ranking Factorization
Machines for Microblog Retrieval
Runwei Qiang Feng Liang

Jianwu Yang

Institute of Computer Science and Technology
Peking University

1

Exploiting Ranking Factorization Machines for Microblog Retrieval

Problem Definition
Q1

Q2

…

Qn

Q1

Q2

…

Qn

ranking

timestamp

Tweet Collection
2

relevance

(Q1 , t1)
(Q2 , t2)
…
(Qn , tn)

Real-time Search
At time t, find tweets
about topic X.
—— TREC’2011

Not Available !!

Motivations
IR for microblog is a non-trivial problem









Length of document is very short
 severe vocabulary-mismatch problem, how to apply query
expansion technique?
Abundance of shortened URLs
 offer ways to expand document, but how to make use of it?
Large quantities of pointless babble


3

How to use the tweet quality to filter non-informative message?


Motivations
Learning to rank methods can make full use of different
models or factors in microblog retrieval





different factors => different features

Many features have been proved useful






4

Semantic features between query and document
Tweet quality features, i.e. link, retweet, and mention
count/binary


Limitations
Features are considered independent





Some features are closely related to each other.


RT and @ symbols occur in the same tweet frequently.

Feature utilization





Link feature: binary => semantic information

Small plane crashes at big airport; no one notices- CNN.com

5


Proposal
Employ an Ranking FM Framework





Adopts FM as the ranking function to model interactions
between features

Utilize several effective features which are neglected in
existing work
Optimize Ranking FM by two optimization methods







6

Stochastic Gradient Descent
Adaptive Regularization


Outline
Ranking FM for Microblog Retrieval






Ranking FM Framework
Optimization Methods

Feature Description
Experiments
Summary





7


Pairwise approach



 x p , y p  ,  xq , yq 


1 y p
  x p , xq  , z  

 1 yq




yq 

yp 


Loss function





(
min L()   lt f ;  x (pt ) , xqt ) , z ( t )      2
l



t 1

FM ranking
Hinge Loss
function Function

8

 

Regularization
term


Factorization Machines Model
n

n

ˆ
y ( x)  w0   wi xi  
i 1

n



i 1 j i 1

k

vi , v j xi x j

factorized
parameters

vi , v j  vi , f ·j , f
v
f 1

nested
interations

factorization dimensionality
2
n

1 k  n

2
2
ˆ
y ( x)  w0   wi xi      vi , f xi    vi , f xi 

2 f 1   i 1
i 1
 i 1


n

𝑂(𝑘 ∙ 𝑛)

9


Learn Ranking FM




timeconsuming

 Grid search on validation set for find the best λ
Adaptive Regularization [2]
Training set



ˆ
(t 1) |  (t ) : arg min   l  y (x | ( t ) ), y    ( t ) 2 

 
 
  x , y ST


Validation Set




ˆ
l  y (x | ( t 1) ), y    ( t ) 2 


 
  x , y SV


 (t 1) | (t 1) : arg min 



adapt the
regularization
automatically

10


Feature Description


Content Relevance Features (3)





Semantic Expansion Features (3x3=9)







Query & Tweet
BM25、TFIDF、Language Model Score
Query & topic info；
Expanded query & Tweet；
Expanded query & Topic info
BM25、TFIDF、Language Model Score

Quality Features (5)


11

mention、retweet、hashtag、link binary feature
tweet length

Experimental Setup


Dataset






title field of link pages

TREC’11 50 queries
TREC’12 60 queries

Evaluation Metrics

Status

200

OK

302

Found

815,794

403

Forbidden

817,273

404

Not Found

868,667

Null

about 2 weeks twitter data

TopicInfo Corpus




HTTP Code

TREC Tweet11 Corpus




Summary statistics of Tweet11 Corpus

Null

67,011

Searchable

# of tweets
8,084,724

8,900,518

Summary statistics of TopicInfo Corpus
200

OK

302

Found
Forbidden

5,050

404

Not Found

92,378

Null

P@30 & MAP

Status

403



HTTP Code

Null

265,468

Searchable
12

# of tweets
1,225,947

688

1,226,635


Baselines


KL2SFBLoc [3]





hitURLrun3 [4]





Expanded language model with two-stage query expansion
Perform very well in TREC’11 real time search task
Use a logistic regression model to learn a pairwise ranking for
microblog retrieval
Best Performing system in TREC’12 real time search task

RSVM_Full



13

Ranking SVM with linear kernel
Same feature set the Ranking FM used


Ranking FM Performance
7% improve
on P@30
4% improve
on P@30
Metric

KL2SFBLoc

RSVM_Full

hitURLrun3

RFM_FullSGD

RFM_FullAR

P@30

0.2441

0.2616

0.2701

0.2808

0.2746

MAP

0.2506

0.2597

0.2642

0.2694

0.2678

TREC’12
Best

14

Ranking FM


Feature Study
0.5
Full
-Quality
-Document Expansion
-Query Expansion
-Content Relevance
Only Content Relevance

0.45

0.4

P@N

0.35

0.3

0.25

0.2

0

5

10

15
N

20

25

30

Ranking FM of k=3 optimized by SGD

15


Influence of the hyper-parameter k

0.29

0.275
RFM_FullSGD

RFM_FullSGD

0.285

0.27
0.265

0.275

MAP

P@30

0.28

0.27

0.255

0.265

0.25

0.26
0.255
0

0.26

5

10

15

0.245
0

k

5

10

15

k

Ranking FM optimized by SGD

16


Stochastic gradient descent v.s.
Adaptive regularization
4

3

x 10

Training time (s)

2.5


2
1.5
1
0.5
0
0

5

10

15

k

Method

P@10

P@30

MAP

RFM_FullSGD

0.4068

0.3695

0.2808

0.2694

RFM_FullAR
17

P@5
0.4034

0.3678

0.2746

0.2678


Summary







Two optimization methods





Pairwise approach
Use Factorization Machines as ranking function

Three groups of features




18

Content Relevance Features
Semantic Expansion Features
Quality Features


References







[1] Iadh Ounis, Jimmy Lin, and Ian Soboroff. Overview of the TREC2011 MicroblogTrack. In Proceedings of TREC 2011, 2012.
[2] S. Rendle. Learning recommender systems with adaptive
regularization. In Proceedings of the fifth ACM international conference
on Web search and data mining, WSDM ’12, pages 133–142. ACM,
2012.
[3] F. Liang, R. Qiang, and J. Yang. Exploiting real-time information
retrieval in the microblogosphere. JCDL ’12, pages 267–276. ACM,
2012.
[4] Z. Han, X. Li, M. Yang, H. Qi, S. Li, and T. Zhao. Hit at TREC 2012
Microblog Track. In Proceedings of TREC 2012, 2013.

19


北京大学计算机科学技术研究所
Institute of Computer Science & Technology Peking University

CIKM 2013
Exploiting Ranking Factorization
Machines for Microblog Retrieval
Runwei Qiang Feng Liang

Jianwu Yang

Institute of Computer Science and Technology
Peking University

20


Exploiting Ranking Factorization Machines for Microblog Retrieval

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (10)

Similaire à Exploiting Ranking Factorization Machines for Microblog Retrieval

Similaire à Exploiting Ranking Factorization Machines for Microblog Retrieval (20)

Dernier

Dernier (20)

Exploiting Ranking Factorization Machines for Microblog Retrieval