This document summarizes a paper on using a Bayesian network model for link-based text classification. The authors propose learning the interactions between document categories from data using a Bayesian network, rather than fixing the network structure. They model the probability of a document's category given both its content and the categories of linked documents. Experimental results show their Bayesian network approach improves over baseline classifiers, particularly providing around a 10% boost when used with an OR gate classifier. The authors conclude the model is valuable and parametrizable, and they discuss exploring other base classifiers and more experiments.
𓀤Call On 6297143586 𓀤 Ultadanga Call Girls In All Kolkata 24/7 Provide Call W...
Link-based document classification using Bayesian Networks
1. Introduction Our solution The Bayesian network model Results Conclusions and future works
Link-based text classification using
Bayesian networks
Luis M. de Campos Juan M. Fernández-Luna
Juan F. Huete Andrés R. Masegosa
Alfonso E. Romero
{lci,jmfluna,jhg,andrew,aeromero}@decsai.ugr.es
Departamento de Ciencias de la Computación e Inteligencia Artificial
E.T.S.I. Informática y de Telecomunicación,
CITIC-UGR, Universidad de Granada
18071 – Granada, Spain
INEX 2009 Workshop, Brisbane
2. Introduction Our solution The Bayesian network model Results Conclusions and future works
Our participation
Universidad de Granada at INEX 2009
The third year we participate on XML mining
(classification).
As previous ocasions, we are interested in Bayesian
networks.
We’ve provided a new solution to this problem.
Sorry, no AdHoc this year .
3. Introduction Our solution The Bayesian network model Results Conclusions and future works
Our participation
The problem itself
A text (XML) categorization problem. Training/test corpus.
Multilabel (more than 1 category per doc).
Links among files (training, test) given in a matrix.
Vectors of indexed terms (normalized tf-idf) provided.
4. Introduction Our solution The Bayesian network model Results Conclusions and future works
Our participation
The problem itself
A text (XML) categorization problem. Training/test corpus.
Same as previous years
Multilabel (more than 1 category per doc).
New this year!
Links among files (training, test) given in a matrix.
Same as 2008
Vectors of indexed terms (normalized tf-idf) provided.
The eternal question, what about XML?
5. Introduction Our solution The Bayesian network model Results Conclusions and future works
Our solution (2008)
Encyclopedia regularity (a document of category Ci tends
to links documents on the same category). Graphically
verified on the training set.
In 2008 we combined a flat-text classifier (Naïve Bayes)
with a Bayesian network of fixed structure which modelled
interaction among categories, using learnt probabilities
P(ci |cj ).
Results were discrete (the worst model among 3, and
improvements over our baseline were not significant).
6. Introduction Our solution The Bayesian network model Results Conclusions and future works
Our starting point (2009)
We detected the same regularity on categories (no matrix
plot this year).
Possible (hidden) hierarchy (for example
Portal:Religion, Portal:Christianity and
Portal:Catholicism).
This year we learn the interactions among categories from
data, no fixed structure, but any which is on the set of
categories.
7. Introduction Our solution The Bayesian network model Results Conclusions and future works
Modeling link structure
Modeling link structure I
We assume there is a global probability distribution
among all these variables, and we will model it with a
Bayesian network.
Variables: categories Ci (39), categories of incoming links
Ej (39) and terms Tk (many).
Main Assumption: the probability distributions of a
document and the categories of files that link it are
independent given the category. Or simbolically:
p(dj , ej |ci ) = p(dj |ci ) p(ej |ci ).
8. Introduction Our solution The Bayesian network model Results Conclusions and future works
Modeling link structure
We then search for the conditional probability p(ci |dj , ej ):
p(dj , ej |ci ) p(ci ) p(dj |ci ) p(ej |ci ) p(ci )
p(ci |dj , ej ) = =
p(dj , ej ) p(dj , ej )
p(ci |dj ) p(dj ) p(ej |ci ) p(ci )
=
p(ci ) p(dj , ej )
p(ci |dj ) p(dj ) p(ci |ej ) p(ej )
=
p(ci ) p(dj , ej )
p(dj ) p(ej ) p(ci |dj ) p(ci |ej )
= .
p(dj , ej ) p(ci )
9. Introduction Our solution The Bayesian network model Results Conclusions and future works
Modeling link structure
We then search for the conditional probability p(ci |dj , ej ):
p(dj , ej |ci ) p(ci ) p(dj |ci ) p(ej |ci ) p(ci )
p(ci |dj , ej ) = =
p(dj , ej ) p(dj , ej )
p(ci |dj ) p(dj ) p(ej |ci ) p(ci )
=
p(ci ) p(dj , ej )
p(ci |dj ) p(dj ) p(ci |ej ) p(ej )
=
p(ci ) p(dj , ej )
p(dj ) p(ej ) p(ci |dj ) p(ci |ej )
= .
p(dj , ej ) p(ci )
p(ci |dj ) p(ci |ej )
p(ci |dj , ej ) ∝
p(ci )
10. Introduction Our solution The Bayesian network model Results Conclusions and future works
Modeling link structure
We then search for the conditional probability p(ci |dj , ej ):
p(dj , ej |ci ) p(ci ) p(dj |ci ) p(ej |ci ) p(ci )
p(ci |dj , ej ) = =
p(dj , ej ) p(dj , ej )
p(ci |dj ) p(dj ) p(ej |ci ) p(ci )
=
p(ci ) p(dj , ej )
p(ci |dj ) p(dj ) p(ci |ej ) p(ej )
=
p(ci ) p(dj , ej )
p(dj ) p(ej ) p(ci |dj ) p(ci |ej )
= .
p(dj , ej ) p(ci )
p(ci |dj ) p(ci |ej )
p(ci |dj , ej ) ∝
p(ci )
p(ci |dj ) p(ci |ej ) / p(ci )
p(ci |dj , ej ) =
p(ci |dj )p(ci |ej )/p(ci ) + p(c i |dj )p(c i |ej )/p(c i )
11. Introduction Our solution The Bayesian network model Results Conclusions and future works
Modeling link structure
Modeling link structure III
p(ci |dj ): output of a probabilistic classifier. Any
probabilistic classifier.
p(ci |ej ): probability of being of Ci considering the set of the
categories of the incoming (known) links. This is modeled
by the Bayesian network.
The problem reduces to the following: [see next slide]
12. Introduction Our solution The Bayesian network model Results Conclusions and future works
Modeling link structure
Modeling link structure IV
We have a vector of 39+39 binary variables for each
document: 39 for each category (1 if the doc. is of that
category, 0 if not), and 39 more (1 if the document is linked
by documents of this category, 0 if not).
With a learning algorithm, we learn a Bayesian network
from that data.
For each document to classify, for each category Ci we
compute its content probability p(ci |dj ) (with base
classifier), and the probability of being of Ci knowing the
categories of certain neighbours p(ci |ej ) (with the learnt
Bayesian network).
We combine them using the blue equation.
13. Introduction Our solution The Bayesian network model Results Conclusions and future works
Learning link structure
Learning Bayesian Network, using WEKA package.
14. Introduction Our solution The Bayesian network model Results Conclusions and future works
Learning link structure
Learning Bayesian Network, using WEKA package.
Hillclimbing algorithm (easy and fast).
BDeu metric.
Three parents max. per node.
15. Introduction Our solution The Bayesian network model Results Conclusions and future works
Learning link structure
Learning Bayesian Network, using WEKA package.
Hillclimbing algorithm (easy and fast).
BDeu metric.
Three parents max. per node.
Propagation, using Elvira (WEKA does not have
propagation algorithms).
16. Introduction Our solution The Bayesian network model Results Conclusions and future works
Learning link structure
Learning Bayesian Network, using WEKA package.
Hillclimbing algorithm (easy and fast).
BDeu metric.
Three parents max. per node.
Propagation, using Elvira (WEKA does not have
propagation algorithms).
Compute p(ci ) (once), and p(ci |ej ) (for each document j).
Exact propagation was slow !
Importance Sampling algorithm (approximate).
17. Introduction Our solution The Bayesian network model Results Conclusions and future works
Base classifiers
Base classifiers
We have used Multinomial Naïve Bayes (binary) and
Bayesian OR Gate (a model presented by our group in
INEX 2007).
They are extensive described on the paper (read it if you
want to learn deeply about these two classifiers).
Any other probabilistic classifiers can be used to firstly
obtain p(ci |dj ) (any suggestions or preferences?).
18. Introduction Our solution The Bayesian network model Results Conclusions and future works
Results
MACC µACC MROC µROC MPRF µPRF MAP
N. Bayes 0.95142 0.93284 0.80260 0.81992 0.49613 0.52670 0.64097
N. Bayes + BN 0.95235 0.93386 0.80209 0.81974 0.50015 0.53029 0.64235
OR gate 0.75420 0.67806 0.92526 0.92163 0.25310 0.26268 0.72955
OR gate + BN 0.84768 0.81891 0.92810 0.92739 0.31611 0.36036 0.72508
Initial results
Problem in the OR gate! (Evaluation assumes
dj ∈ Ci ⇔ p(ci |dj ) > 0.5). This is not, in general, true for the
OR gate, need some scaling procedure (like SCut strategy).
19. Introduction Our solution The Bayesian network model Results Conclusions and future works
Results
MACC µACC MROC µROC MPRF µPRF MAP
N. Bayes 0.95142 0.93284 0.80260 0.81992 0.49613 0.52670 0.64097
N. Bayes + BN 0.95235 0.93386 0.80209 0.81974 0.50015 0.53029 0.64235
OR gate 0.75420 0.67806 0.92526 0.92163 0.25310 0.26268 0.72955
OR gate + BN 0.84768 0.81891 0.92810 0.92739 0.31611 0.36036 0.72508
Initial results
Problem in the OR gate! (Evaluation assumes
dj ∈ Ci ⇔ p(ci |dj ) > 0.5). This is not, in general, true for the
OR gate, need some scaling procedure (like SCut strategy).
MACC µACC MROC µROC MPRF µPRF MAP
OR gate 0.92932 0.92612 0.92526 0.92163 0.45966 0.50407 0.72955
OR gate + BN 0.96607 0.95588 0.92810 0.92739 0.51729 0.55116 0.72508
Scaled results (see paper for details).
20. Introduction Our solution The Bayesian network model Results Conclusions and future works
Conclusions
The model is new, parametrizable (learning algorithm,
parameters of algorithm, base classifier,...) and valuable
by itself (always improves a baseline).
21. Introduction Our solution The Bayesian network model Results Conclusions and future works
Conclusions
The model is new, parametrizable (learning algorithm,
parameters of algorithm, base classifier,...) and valuable
by itself (always improves a baseline).
Using the Bayesian network over the OR gate provides a
10% of improvement in some measures
.
22. Introduction Our solution The Bayesian network model Results Conclusions and future works
Conclusions
The model is new, parametrizable (learning algorithm,
parameters of algorithm, base classifier,...) and valuable
by itself (always improves a baseline).
Using the Bayesian network over the OR gate provides a
10% of improvement in some measures
.
Good results on ROC (ranked third).
23. Introduction Our solution The Bayesian network model Results Conclusions and future works
Conclusions
The model is new, parametrizable (learning algorithm,
parameters of algorithm, base classifier,...) and valuable
by itself (always improves a baseline).
Using the Bayesian network over the OR gate provides a
10% of improvement in some measures
.
Good results on ROC (ranked third).
Other base classifier? SVM with probabilistic outputs,
Logistic Regression...
24. Introduction Our solution The Bayesian network model Results Conclusions and future works
Conclusions
The model is new, parametrizable (learning algorithm,
parameters of algorithm, base classifier,...) and valuable
by itself (always improves a baseline).
Using the Bayesian network over the OR gate provides a
10% of improvement in some measures
.
Good results on ROC (ranked third).
Other base classifier? SVM with probabilistic outputs,
Logistic Regression...
More experiments for the final version of the paper!
25. Introduction Our solution The Bayesian network model Results Conclusions and future works
Thank you for your
attention!
Questions, comments, criticism?
<SPAM>Expecting to defend my PhD by April 2010,
searching for a PostDoc (in Europe) for 2010 on ML/IR
related stuff. Any offers? < /SPAM>