SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
北京大学计算机科学技术研究所
Institute of Computer Science & Technology Peking University

CIKM 2013
Exploiting Ranking Factorization
Machines for Microblog Retrieval
Runwei Qiang Feng Liang

Jianwu Yang

Institute of Computer Science and Technology
Peking University

1

Exploiting Ranking Factorization Machines for Microblog Retrieval
Problem Definition
Q1

Q2

…

Qn

Q1

Q2

…

Qn

ranking

timestamp

Tweet Collection
2

relevance

(Q1 , t1)
(Q2 , t2)
…
(Qn , tn)

Real-time Search
At time t, find tweets
about topic X.
—— TREC’2011

Not Available !!
Exploiting Ranking Factorization Machines for Microblog Retrieval
Motivations
IR for microblog is a non-trivial problem









Length of document is very short
 severe vocabulary-mismatch problem, how to apply query
expansion technique?
Abundance of shortened URLs
 offer ways to expand document, but how to make use of it?
Large quantities of pointless babble


3

How to use the tweet quality to filter non-informative message?

Exploiting Ranking Factorization Machines for Microblog Retrieval
Motivations
Learning to rank methods can make full use of different
models or factors in microblog retrieval





different factors => different features

Many features have been proved useful






4

Semantic features between query and document
Tweet quality features, i.e. link, retweet, and mention
count/binary

Exploiting Ranking Factorization Machines for Microblog Retrieval
Limitations
Features are considered independent





Some features are closely related to each other.


RT and @ symbols occur in the same tweet frequently.

Feature utilization





Link feature: binary => semantic information

Small plane crashes at big airport; no one notices- CNN.com

5

Exploiting Ranking Factorization Machines for Microblog Retrieval
Proposal
Employ an Ranking FM Framework





Adopts FM as the ranking function to model interactions
between features

Utilize several effective features which are neglected in
existing work
Optimize Ranking FM by two optimization methods







6

Stochastic Gradient Descent
Adaptive Regularization

Exploiting Ranking Factorization Machines for Microblog Retrieval
Outline
Ranking FM for Microblog Retrieval






Ranking FM Framework
Optimization Methods

Feature Description
Experiments
Summary





7

Exploiting Ranking Factorization Machines for Microblog Retrieval
Ranking FM Framework
Pairwise approach



 x p , y p  ,  xq , yq 


1 y p
  x p , xq  , z  

 1 yq




yq 

yp 


Loss function





(
min L()   lt f ;  x (pt ) , xqt ) , z ( t )      2
l



t 1

FM ranking
Hinge Loss
function Function

8

 

Regularization
term

Exploiting Ranking Factorization Machines for Microblog Retrieval
Factorization Machines Model
n

n

ˆ
y ( x)  w0   wi xi  
i 1

n



i 1 j i 1

k

vi , v j xi x j

factorized
parameters

vi , v j  vi , f ·j , f
v
f 1

nested
interations

factorization dimensionality
2
n

1 k  n

2
2
ˆ
y ( x)  w0   wi xi      vi , f xi    vi , f xi 

2 f 1   i 1
i 1
 i 1


n

𝑂(𝑘 ∙ 𝑛)

9

Exploiting Ranking Factorization Machines for Microblog Retrieval
Learn Ranking FM




timeconsuming

Stochastic Gradient Descent
 Grid search on validation set for find the best λ
Adaptive Regularization [2]
Training set



ˆ
(t 1) |  (t ) : arg min   l  y (x | ( t ) ), y    ( t ) 2 

 
 
  x , y ST


Validation Set




ˆ
l  y (x | ( t 1) ), y    ( t ) 2 


 
  x , y SV


 (t 1) | (t 1) : arg min 



adapt the
regularization
automatically

10

Exploiting Ranking Factorization Machines for Microblog Retrieval
Feature Description


Content Relevance Features (3)





Semantic Expansion Features (3x3=9)







Query & Tweet
BM25、TFIDF、Language Model Score
Query & topic info;
Expanded query & Tweet;
Expanded query & Topic info
BM25、TFIDF、Language Model Score

Quality Features (5)


11

mention、retweet、hashtag、link binary feature
tweet length
Exploiting Ranking Factorization Machines for Microblog Retrieval
Experimental Setup


Dataset






title field of link pages

TREC’11 50 queries
TREC’12 60 queries

Evaluation Metrics

Status

200

OK

302

Found

815,794

403

Forbidden

817,273

404

Not Found

868,667

Null

about 2 weeks twitter data

TopicInfo Corpus




HTTP Code

TREC Tweet11 Corpus




Summary statistics of Tweet11 Corpus

Null

67,011

Searchable

# of tweets
8,084,724

8,900,518

Summary statistics of TopicInfo Corpus
200

OK

302

Found
Forbidden

5,050

404

Not Found

92,378

Null

P@30 & MAP

Status

403



HTTP Code

Null

265,468

Searchable
12

# of tweets
1,225,947

688

1,226,635

Exploiting Ranking Factorization Machines for Microblog Retrieval
Baselines


KL2SFBLoc [3]





hitURLrun3 [4]





Expanded language model with two-stage query expansion
Perform very well in TREC’11 real time search task
Use a logistic regression model to learn a pairwise ranking for
microblog retrieval
Best Performing system in TREC’12 real time search task

RSVM_Full



13

Ranking SVM with linear kernel
Same feature set the Ranking FM used

Exploiting Ranking Factorization Machines for Microblog Retrieval
Ranking FM Performance
7% improve
on P@30
4% improve
on P@30
Metric

KL2SFBLoc

RSVM_Full

hitURLrun3

RFM_FullSGD

RFM_FullAR

P@30

0.2441

0.2616

0.2701

0.2808

0.2746

MAP

0.2506

0.2597

0.2642

0.2694

0.2678

TREC’12
Best

14

Ranking FM

Exploiting Ranking Factorization Machines for Microblog Retrieval
Feature Study
0.5
Full
-Quality
-Document Expansion
-Query Expansion
-Content Relevance
Only Content Relevance

0.45

0.4

P@N

0.35

0.3

0.25

0.2

0

5

10

15
N

20

25

30

Ranking FM of k=3 optimized by SGD

15

Exploiting Ranking Factorization Machines for Microblog Retrieval
Influence of the hyper-parameter k

0.29

0.275
RFM_FullSGD

RFM_FullSGD

0.285

0.27
0.265

0.275

MAP

P@30

0.28

0.27

0.255

0.265

0.25

0.26
0.255
0

0.26

5

10

15

0.245
0

k

5

10

15

k

Ranking FM optimized by SGD

16

Exploiting Ranking Factorization Machines for Microblog Retrieval
Stochastic gradient descent v.s.
Adaptive regularization
4

3

x 10

Training time (s)

2.5

Stochastic Gradient Descent
Adaptive Regularization

2
1.5
1
0.5
0
0

5

10

15

k

Method

P@10

P@30

MAP

RFM_FullSGD

0.4068

0.3695

0.2808

0.2694

RFM_FullAR
17

P@5
0.4034

0.3678

0.2746

0.2678

Exploiting Ranking Factorization Machines for Microblog Retrieval
Summary


Ranking FM Framework





Two optimization methods





Pairwise approach
Use Factorization Machines as ranking function
Stochastic Gradient Descent
Adaptive Regularization

Three groups of features




18

Content Relevance Features
Semantic Expansion Features
Quality Features

Exploiting Ranking Factorization Machines for Microblog Retrieval
References







[1] Iadh Ounis, Jimmy Lin, and Ian Soboroff. Overview of the TREC2011 MicroblogTrack. In Proceedings of TREC 2011, 2012.
[2] S. Rendle. Learning recommender systems with adaptive
regularization. In Proceedings of the fifth ACM international conference
on Web search and data mining, WSDM ’12, pages 133–142. ACM,
2012.
[3] F. Liang, R. Qiang, and J. Yang. Exploiting real-time information
retrieval in the microblogosphere. JCDL ’12, pages 267–276. ACM,
2012.
[4] Z. Han, X. Li, M. Yang, H. Qi, S. Li, and T. Zhao. Hit at TREC 2012
Microblog Track. In Proceedings of TREC 2012, 2013.

19

Exploiting Ranking Factorization Machines for Microblog Retrieval
北京大学计算机科学技术研究所
Institute of Computer Science & Technology Peking University

CIKM 2013
Exploiting Ranking Factorization
Machines for Microblog Retrieval
Runwei Qiang Feng Liang

Jianwu Yang

Institute of Computer Science and Technology
Peking University

20

Exploiting Ranking Factorization Machines for Microblog Retrieval

Contenu connexe

Tendances

Hybrid acquisition of temporal scopes for rdf data
Hybrid acquisition of temporal scopes for rdf dataHybrid acquisition of temporal scopes for rdf data
Hybrid acquisition of temporal scopes for rdf dataAnisa Rula
 
Introduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnIntroduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnMatt Hagy
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...MLconf
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlowBarbara Fusinska
 
Introduction to matplotlib
Introduction to matplotlibIntroduction to matplotlib
Introduction to matplotlibPiyush rai
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Databricks
 
Matplotlib Review 2021
Matplotlib Review 2021Matplotlib Review 2021
Matplotlib Review 2021Bhaskar J.Roy
 
NTNU @ Social Event Detection Task (SED)
NTNU @ Social Event Detection Task (SED)NTNU @ Social Event Detection Task (SED)
NTNU @ Social Event Detection Task (SED)Massimiliano Ruocco
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRDatabricks
 
Feature Selection for Document Ranking
Feature Selection for Document RankingFeature Selection for Document Ranking
Feature Selection for Document RankingAndrea Gigli
 
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesScott Clark
 
Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkKazuki Fujikawa
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionKazuki Fujikawa
 
The Joy of SciPy
The Joy of SciPyThe Joy of SciPy
The Joy of SciPykammeyer
 
Parallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsParallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsRevolution Analytics
 
Data Analysis in Python-NumPy
Data Analysis in Python-NumPyData Analysis in Python-NumPy
Data Analysis in Python-NumPyDevashish Kumar
 
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...MLconf
 

Tendances (20)

Hybrid acquisition of temporal scopes for rdf data
Hybrid acquisition of temporal scopes for rdf dataHybrid acquisition of temporal scopes for rdf data
Hybrid acquisition of temporal scopes for rdf data
 
Introduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnIntroduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learn
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
Introduction to matplotlib
Introduction to matplotlibIntroduction to matplotlib
Introduction to matplotlib
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
 
Matplotlib Review 2021
Matplotlib Review 2021Matplotlib Review 2021
Matplotlib Review 2021
 
TensorFlow in 3 sentences
TensorFlow in 3 sentencesTensorFlow in 3 sentences
TensorFlow in 3 sentences
 
NTNU @ Social Event Detection Task (SED)
NTNU @ Social Event Detection Task (SED)NTNU @ Social Event Detection Task (SED)
NTNU @ Social Event Detection Task (SED)
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkR
 
Feature Selection for Document Ranking
Feature Selection for Document RankingFeature Selection for Document Ranking
Feature Selection for Document Ranking
 
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning Pipelines
 
Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman network
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
 
The Joy of SciPy
The Joy of SciPyThe Joy of SciPy
The Joy of SciPy
 
Parallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear ModelsParallel External Memory Algorithms Applied to Generalized Linear Models
Parallel External Memory Algorithms Applied to Generalized Linear Models
 
Data Analysis in Python-NumPy
Data Analysis in Python-NumPyData Analysis in Python-NumPy
Data Analysis in Python-NumPy
 
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
 
Python Scipy Numpy
Python Scipy NumpyPython Scipy Numpy
Python Scipy Numpy
 
ECE 565 FInal Project
ECE 565 FInal ProjectECE 565 FInal Project
ECE 565 FInal Project
 

En vedette

Computational Advertising in Yelp Local Ads
Computational Advertising in Yelp Local AdsComputational Advertising in Yelp Local Ads
Computational Advertising in Yelp Local Adssoupsranjan
 
(2016 07-19) providing click predictions in real-time at scale
(2016 07-19) providing click predictions in real-time at scale(2016 07-19) providing click predictions in real-time at scale
(2016 07-19) providing click predictions in real-time at scaleLawrence Evans
 
Factorization Machines with libFM
Factorization Machines with libFMFactorization Machines with libFM
Factorization Machines with libFMLiangjie Hong
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganSpark Summit
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
 
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengSpark Summit
 
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...Spark Summit
 
Recent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondRecent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondXiangrui Meng
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkCaserta
 

En vedette (10)

Computational Advertising in Yelp Local Ads
Computational Advertising in Yelp Local AdsComputational Advertising in Yelp Local Ads
Computational Advertising in Yelp Local Ads
 
(2016 07-19) providing click predictions in real-time at scale
(2016 07-19) providing click predictions in real-time at scale(2016 07-19) providing click predictions in real-time at scale
(2016 07-19) providing click predictions in real-time at scale
 
Training
TrainingTraining
Training
 
Factorization Machines with libFM
Factorization Machines with libFMFactorization Machines with libFM
Factorization Machines with libFM
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
 
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Cla...
 
Recent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondRecent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and Beyond
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on Spark
 

Similaire à Exploiting Ranking Factorization Machines for Microblog Retrieval

PythonとAutoML at PyConJP 2019
PythonとAutoML at PyConJP 2019PythonとAutoML at PyConJP 2019
PythonとAutoML at PyConJP 2019Masashi Shibata
 
ICMT 2016: Search-Based Model Transformations with MOMoT
ICMT 2016: Search-Based Model Transformations with MOMoTICMT 2016: Search-Based Model Transformations with MOMoT
ICMT 2016: Search-Based Model Transformations with MOMoTMartin Fleck
 
Semantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media PostsSemantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media PostsGiulio Carducci
 
Session 2 - Akyildiz, Beinecke, Yee at MLconf NYC
Session 2 - Akyildiz, Beinecke, Yee at MLconf NYCSession 2 - Akyildiz, Beinecke, Yee at MLconf NYC
Session 2 - Akyildiz, Beinecke, Yee at MLconf NYCMLconf
 
Self healing test automation with Healenium and Minimization of regression su...
Self healing test automation with Healenium and Minimization of regression su...Self healing test automation with Healenium and Minimization of regression su...
Self healing test automation with Healenium and Minimization of regression su...Dmitriy Gumeniuk
 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...David Zibriczky
 
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Lucidworks
 
MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"Hideyuki Kawashima
 
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...Ali Ouni
 
Junhua wang ai_next_con
Junhua wang ai_next_conJunhua wang ai_next_con
Junhua wang ai_next_conJunhua Wang
 
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesSigOpt
 
Performance and how to measure it - ProgSCon London 2016
Performance and how to measure it - ProgSCon London 2016Performance and how to measure it - ProgSCon London 2016
Performance and how to measure it - ProgSCon London 2016Matt Warren
 
Massaging the Pony: Message Queues and You
Massaging the Pony: Message Queues and YouMassaging the Pony: Message Queues and You
Massaging the Pony: Message Queues and YouShawn Rider
 
PhD Thesis Presentation
PhD Thesis PresentationPhD Thesis Presentation
PhD Thesis PresentationLola Burgueño
 
Automated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform DesignsAutomated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform DesignsAijun Zhang
 
Optimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone MLOptimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone MLSpark Summit
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksBen Ball
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...Databricks
 
Deep Learning Inference at speed and scale
Deep Learning Inference at speed and scaleDeep Learning Inference at speed and scale
Deep Learning Inference at speed and scaleBill Liu
 

Similaire à Exploiting Ranking Factorization Machines for Microblog Retrieval (20)

PythonとAutoML at PyConJP 2019
PythonとAutoML at PyConJP 2019PythonとAutoML at PyConJP 2019
PythonとAutoML at PyConJP 2019
 
Performance is a Feature!
Performance is a Feature!Performance is a Feature!
Performance is a Feature!
 
ICMT 2016: Search-Based Model Transformations with MOMoT
ICMT 2016: Search-Based Model Transformations with MOMoTICMT 2016: Search-Based Model Transformations with MOMoT
ICMT 2016: Search-Based Model Transformations with MOMoT
 
Semantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media PostsSemantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media Posts
 
Session 2 - Akyildiz, Beinecke, Yee at MLconf NYC
Session 2 - Akyildiz, Beinecke, Yee at MLconf NYCSession 2 - Akyildiz, Beinecke, Yee at MLconf NYC
Session 2 - Akyildiz, Beinecke, Yee at MLconf NYC
 
Self healing test automation with Healenium and Minimization of regression su...
Self healing test automation with Healenium and Minimization of regression su...Self healing test automation with Healenium and Minimization of regression su...
Self healing test automation with Healenium and Minimization of regression su...
 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
 
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks
 
MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"
 
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
 
Junhua wang ai_next_con
Junhua wang ai_next_conJunhua wang ai_next_con
Junhua wang ai_next_con
 
Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning Pipelines
 
Performance and how to measure it - ProgSCon London 2016
Performance and how to measure it - ProgSCon London 2016Performance and how to measure it - ProgSCon London 2016
Performance and how to measure it - ProgSCon London 2016
 
Massaging the Pony: Message Queues and You
Massaging the Pony: Message Queues and YouMassaging the Pony: Message Queues and You
Massaging the Pony: Message Queues and You
 
PhD Thesis Presentation
PhD Thesis PresentationPhD Thesis Presentation
PhD Thesis Presentation
 
Automated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform DesignsAutomated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform Designs
 
Optimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone MLOptimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone ML
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
 
Deep Learning Inference at speed and scale
Deep Learning Inference at speed and scaleDeep Learning Inference at speed and scale
Deep Learning Inference at speed and scale
 

Dernier

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Dernier (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Exploiting Ranking Factorization Machines for Microblog Retrieval

  • 1. 北京大学计算机科学技术研究所 Institute of Computer Science & Technology Peking University CIKM 2013 Exploiting Ranking Factorization Machines for Microblog Retrieval Runwei Qiang Feng Liang Jianwu Yang Institute of Computer Science and Technology Peking University 1 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 2. Problem Definition Q1 Q2 … Qn Q1 Q2 … Qn ranking timestamp Tweet Collection 2 relevance (Q1 , t1) (Q2 , t2) … (Qn , tn) Real-time Search At time t, find tweets about topic X. —— TREC’2011 Not Available !! Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 3. Motivations IR for microblog is a non-trivial problem     Length of document is very short  severe vocabulary-mismatch problem, how to apply query expansion technique? Abundance of shortened URLs  offer ways to expand document, but how to make use of it? Large quantities of pointless babble  3 How to use the tweet quality to filter non-informative message? Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 4. Motivations Learning to rank methods can make full use of different models or factors in microblog retrieval   different factors => different features Many features have been proved useful    4 Semantic features between query and document Tweet quality features, i.e. link, retweet, and mention count/binary Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 5. Limitations Features are considered independent   Some features are closely related to each other.  RT and @ symbols occur in the same tweet frequently. Feature utilization   Link feature: binary => semantic information Small plane crashes at big airport; no one notices- CNN.com 5 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 6. Proposal Employ an Ranking FM Framework   Adopts FM as the ranking function to model interactions between features Utilize several effective features which are neglected in existing work Optimize Ranking FM by two optimization methods     6 Stochastic Gradient Descent Adaptive Regularization Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 7. Outline Ranking FM for Microblog Retrieval    Ranking FM Framework Optimization Methods Feature Description Experiments Summary    7 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 8. Ranking FM Framework Pairwise approach   x p , y p  ,  xq , yq   1 y p   x p , xq  , z     1 yq   yq   yp   Loss function   ( min L()   lt f ;  x (pt ) , xqt ) , z ( t )      2 l  t 1 FM ranking Hinge Loss function Function 8   Regularization term Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 9. Factorization Machines Model n n ˆ y ( x)  w0   wi xi   i 1 n  i 1 j i 1 k vi , v j xi x j factorized parameters vi , v j  vi , f ·j , f v f 1 nested interations factorization dimensionality 2 n  1 k  n  2 2 ˆ y ( x)  w0   wi xi      vi , f xi    vi , f xi   2 f 1   i 1 i 1  i 1   n 𝑂(𝑘 ∙ 𝑛) 9 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 10. Learn Ranking FM   timeconsuming Stochastic Gradient Descent  Grid search on validation set for find the best λ Adaptive Regularization [2] Training set   ˆ (t 1) |  (t ) : arg min   l  y (x | ( t ) ), y    ( t ) 2         x , y ST  Validation Set   ˆ l  y (x | ( t 1) ), y    ( t ) 2        x , y SV   (t 1) | (t 1) : arg min    adapt the regularization automatically 10 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 11. Feature Description  Content Relevance Features (3)    Semantic Expansion Features (3x3=9)      Query & Tweet BM25、TFIDF、Language Model Score Query & topic info; Expanded query & Tweet; Expanded query & Topic info BM25、TFIDF、Language Model Score Quality Features (5)   11 mention、retweet、hashtag、link binary feature tweet length Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 12. Experimental Setup  Dataset    title field of link pages TREC’11 50 queries TREC’12 60 queries Evaluation Metrics Status 200 OK 302 Found 815,794 403 Forbidden 817,273 404 Not Found 868,667 Null about 2 weeks twitter data TopicInfo Corpus   HTTP Code TREC Tweet11 Corpus   Summary statistics of Tweet11 Corpus Null 67,011 Searchable # of tweets 8,084,724 8,900,518 Summary statistics of TopicInfo Corpus 200 OK 302 Found Forbidden 5,050 404 Not Found 92,378 Null P@30 & MAP Status 403  HTTP Code Null 265,468 Searchable 12 # of tweets 1,225,947 688 1,226,635 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 13. Baselines  KL2SFBLoc [3]    hitURLrun3 [4]    Expanded language model with two-stage query expansion Perform very well in TREC’11 real time search task Use a logistic regression model to learn a pairwise ranking for microblog retrieval Best Performing system in TREC’12 real time search task RSVM_Full   13 Ranking SVM with linear kernel Same feature set the Ranking FM used Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 14. Ranking FM Performance 7% improve on P@30 4% improve on P@30 Metric KL2SFBLoc RSVM_Full hitURLrun3 RFM_FullSGD RFM_FullAR P@30 0.2441 0.2616 0.2701 0.2808 0.2746 MAP 0.2506 0.2597 0.2642 0.2694 0.2678 TREC’12 Best 14 Ranking FM Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 15. Feature Study 0.5 Full -Quality -Document Expansion -Query Expansion -Content Relevance Only Content Relevance 0.45 0.4 P@N 0.35 0.3 0.25 0.2 0 5 10 15 N 20 25 30 Ranking FM of k=3 optimized by SGD 15 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 16. Influence of the hyper-parameter k 0.29 0.275 RFM_FullSGD RFM_FullSGD 0.285 0.27 0.265 0.275 MAP P@30 0.28 0.27 0.255 0.265 0.25 0.26 0.255 0 0.26 5 10 15 0.245 0 k 5 10 15 k Ranking FM optimized by SGD 16 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 17. Stochastic gradient descent v.s. Adaptive regularization 4 3 x 10 Training time (s) 2.5 Stochastic Gradient Descent Adaptive Regularization 2 1.5 1 0.5 0 0 5 10 15 k Method P@10 P@30 MAP RFM_FullSGD 0.4068 0.3695 0.2808 0.2694 RFM_FullAR 17 P@5 0.4034 0.3678 0.2746 0.2678 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 18. Summary  Ranking FM Framework    Two optimization methods    Pairwise approach Use Factorization Machines as ranking function Stochastic Gradient Descent Adaptive Regularization Three groups of features    18 Content Relevance Features Semantic Expansion Features Quality Features Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 19. References     [1] Iadh Ounis, Jimmy Lin, and Ian Soboroff. Overview of the TREC2011 MicroblogTrack. In Proceedings of TREC 2011, 2012. [2] S. Rendle. Learning recommender systems with adaptive regularization. In Proceedings of the fifth ACM international conference on Web search and data mining, WSDM ’12, pages 133–142. ACM, 2012. [3] F. Liang, R. Qiang, and J. Yang. Exploiting real-time information retrieval in the microblogosphere. JCDL ’12, pages 267–276. ACM, 2012. [4] Z. Han, X. Li, M. Yang, H. Qi, S. Li, and T. Zhao. Hit at TREC 2012 Microblog Track. In Proceedings of TREC 2012, 2013. 19 Exploiting Ranking Factorization Machines for Microblog Retrieval
  • 20. 北京大学计算机科学技术研究所 Institute of Computer Science & Technology Peking University CIKM 2013 Exploiting Ranking Factorization Machines for Microblog Retrieval Runwei Qiang Feng Liang Jianwu Yang Institute of Computer Science and Technology Peking University 20 Exploiting Ranking Factorization Machines for Microblog Retrieval