SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Introduction      Our solution       The Bayesian network model   Results   Conclusions and future works




                 Link-based text classification using
                         Bayesian networks

               Luis M. de Campos Juan M. Fernández-Luna
                     Juan F. Huete Andrés R. Masegosa
                              Alfonso E. Romero
          {lci,jmfluna,jhg,andrew,aeromero}@decsai.ugr.es

               Departamento de Ciencias de la Computación e Inteligencia Artificial
                           E.T.S.I. Informática y de Telecomunicación,
                              CITIC-UGR, Universidad de Granada
                                     18071 – Granada, Spain

                                 INEX 2009 Workshop, Brisbane
Introduction        Our solution   The Bayesian network model   Results   Conclusions and future works


Our participation

Universidad de Granada at INEX 2009



                The third year we participate on XML mining
                (classification).


                As previous ocasions, we are interested in Bayesian
                networks.


                We’ve provided a new solution to this problem.


                Sorry, no AdHoc this year              .
Introduction        Our solution   The Bayesian network model   Results   Conclusions and future works


Our participation

The problem itself




                A text (XML) categorization problem. Training/test corpus.


                Multilabel (more than 1 category per doc).


                Links among files (training, test) given in a matrix.


                Vectors of indexed terms (normalized tf-idf) provided.
Introduction         Our solution   The Bayesian network model   Results   Conclusions and future works


Our participation

The problem itself




                A text (XML) categorization problem. Training/test corpus.
                    Same as previous years
                Multilabel (more than 1 category per doc).
                    New this year!
                Links among files (training, test) given in a matrix.
                    Same as 2008
                Vectors of indexed terms (normalized tf-idf) provided.
                    The eternal question, what about XML?
Introduction      Our solution   The Bayesian network model   Results   Conclusions and future works




Our solution (2008)



               Encyclopedia regularity (a document of category Ci tends
               to links documents on the same category). Graphically
               verified on the training set.


               In 2008 we combined a flat-text classifier (Naïve Bayes)
               with a Bayesian network of fixed structure which modelled
               interaction among categories, using learnt probabilities
               P(ci |cj ).


               Results were discrete (the worst model among 3, and
               improvements over our baseline were not significant).
Introduction      Our solution   The Bayesian network model   Results   Conclusions and future works




Our starting point (2009)



               We detected the same regularity on categories (no matrix
               plot this year).


               Possible (hidden) hierarchy (for example
               Portal:Religion, Portal:Christianity and
               Portal:Catholicism).


               This year we learn the interactions among categories from
               data, no fixed structure, but any which is on the set of
               categories.
Introduction          Our solution   The Bayesian network model    Results      Conclusions and future works


Modeling link structure

Modeling link structure I


                We assume there is a global probability distribution
                among all these variables, and we will model it with a
                Bayesian network.


                Variables: categories Ci (39), categories of incoming links
                Ej (39) and terms Tk (many).


                Main Assumption: the probability distributions of a
                document and the categories of files that link it are
                independent given the category. Or simbolically:

                                     p(dj , ej |ci ) = p(dj |ci ) p(ej |ci ).
Introduction          Our solution       The Bayesian network model    Results      Conclusions and future works


Modeling link structure




       We then search for the conditional probability p(ci |dj , ej ):

                                         p(dj , ej |ci ) p(ci )      p(dj |ci ) p(ej |ci ) p(ci )
               p(ci |dj , ej )       =                           =
                                              p(dj , ej )                    p(dj , ej )
                                         p(ci |dj ) p(dj ) p(ej |ci ) p(ci )
                                     =
                                                 p(ci ) p(dj , ej )
                                         p(ci |dj ) p(dj ) p(ci |ej ) p(ej )
                                     =
                                                  p(ci ) p(dj , ej )
                                          p(dj ) p(ej )         p(ci |dj ) p(ci |ej )
                                     =                                                 .
                                            p(dj , ej )               p(ci )
Introduction          Our solution          The Bayesian network model      Results    Conclusions and future works


Modeling link structure




       We then search for the conditional probability p(ci |dj , ej ):

                                            p(dj , ej |ci ) p(ci )      p(dj |ci ) p(ej |ci ) p(ci )
               p(ci |dj , ej )       =                              =
                                                 p(dj , ej )                    p(dj , ej )
                                            p(ci |dj ) p(dj ) p(ej |ci ) p(ci )
                                     =
                                                    p(ci ) p(dj , ej )
                                            p(ci |dj ) p(dj ) p(ci |ej ) p(ej )
                                     =
                                                     p(ci ) p(dj , ej )
                                             p(dj ) p(ej )         p(ci |dj ) p(ci |ej )
                                     =                                                    .
                                               p(dj , ej )               p(ci )

                                                             p(ci |dj ) p(ci |ej )
                                         p(ci |dj , ej ) ∝
                                                                   p(ci )
Introduction          Our solution          The Bayesian network model      Results    Conclusions and future works


Modeling link structure

       We then search for the conditional probability p(ci |dj , ej ):
                                            p(dj , ej |ci ) p(ci )      p(dj |ci ) p(ej |ci ) p(ci )
               p(ci |dj , ej )       =                              =
                                                 p(dj , ej )                    p(dj , ej )
                                            p(ci |dj ) p(dj ) p(ej |ci ) p(ci )
                                     =
                                                    p(ci ) p(dj , ej )
                                            p(ci |dj ) p(dj ) p(ci |ej ) p(ej )
                                     =
                                                     p(ci ) p(dj , ej )
                                             p(dj ) p(ej )         p(ci |dj ) p(ci |ej )
                                     =                                                    .
                                               p(dj , ej )               p(ci )

                                                             p(ci |dj ) p(ci |ej )
                                         p(ci |dj , ej ) ∝
                                                                   p(ci )

                                                    p(ci |dj ) p(ci |ej ) / p(ci )
               p(ci |dj , ej ) =
                                     p(ci |dj )p(ci |ej )/p(ci ) + p(c i |dj )p(c i |ej )/p(c i )
Introduction          Our solution   The Bayesian network model   Results   Conclusions and future works


Modeling link structure

Modeling link structure III




                p(ci |dj ): output of a probabilistic classifier. Any
                probabilistic classifier.


                p(ci |ej ): probability of being of Ci considering the set of the
                categories of the incoming (known) links. This is modeled
                by the Bayesian network.


                The problem reduces to the following: [see next slide]
Introduction          Our solution   The Bayesian network model   Results   Conclusions and future works


Modeling link structure

Modeling link structure IV

                We have a vector of 39+39 binary variables for each
                document: 39 for each category (1 if the doc. is of that
                category, 0 if not), and 39 more (1 if the document is linked
                by documents of this category, 0 if not).

                With a learning algorithm, we learn a Bayesian network
                from that data.

                For each document to classify, for each category Ci we
                compute its content probability p(ci |dj ) (with base
                classifier), and the probability of being of Ci knowing the
                categories of certain neighbours p(ci |ej ) (with the learnt
                Bayesian network).

                We combine them using the blue equation.
Introduction          Our solution   The Bayesian network model   Results   Conclusions and future works


Learning link structure



                Learning Bayesian Network, using WEKA package.
Introduction          Our solution    The Bayesian network model   Results   Conclusions and future works


Learning link structure



                Learning Bayesian Network, using WEKA package.

                          Hillclimbing algorithm (easy and fast).

                          BDeu metric.

                          Three parents max. per node.
Introduction          Our solution    The Bayesian network model   Results   Conclusions and future works


Learning link structure



                Learning Bayesian Network, using WEKA package.

                          Hillclimbing algorithm (easy and fast).

                          BDeu metric.

                          Three parents max. per node.


                Propagation, using Elvira (WEKA does not have
                propagation algorithms).
Introduction          Our solution    The Bayesian network model       Results   Conclusions and future works


Learning link structure



                Learning Bayesian Network, using WEKA package.

                          Hillclimbing algorithm (easy and fast).

                          BDeu metric.

                          Three parents max. per node.


                Propagation, using Elvira (WEKA does not have
                propagation algorithms).

                          Compute p(ci ) (once), and p(ci |ej ) (for each document j).

                          Exact propagation was slow               !

                          Importance Sampling algorithm (approximate).
Introduction         Our solution   The Bayesian network model   Results   Conclusions and future works


Base classifiers

Base classifiers




                  We have used Multinomial Naïve Bayes (binary) and
                  Bayesian OR Gate (a model presented by our group in
                  INEX 2007).


                  They are extensive described on the paper (read it if you
                  want to learn deeply about these two classifiers).


                  Any other probabilistic classifiers can be used to firstly
                  obtain p(ci |dj ) (any suggestions or preferences?).
Introduction          Our solution        The Bayesian network model       Results       Conclusions and future works




Results




                                 MACC       µACC        MROC      µROC      MPRF      µPRF        MAP
                  N. Bayes      0.95142    0.93284     0.80260   0.81992   0.49613   0.52670    0.64097
               N. Bayes + BN    0.95235    0.93386     0.80209   0.81974   0.50015   0.53029    0.64235
                  OR gate       0.75420    0.67806     0.92526   0.92163   0.25310   0.26268    0.72955
               OR gate + BN     0.84768    0.81891     0.92810   0.92739   0.31611   0.36036    0.72508


                                                     Initial results

       Problem in the OR gate! (Evaluation assumes
       dj ∈ Ci ⇔ p(ci |dj ) > 0.5). This is not, in general, true for the
       OR gate, need some scaling procedure (like SCut strategy).
Introduction          Our solution        The Bayesian network model       Results       Conclusions and future works




Results


                                 MACC       µACC        MROC      µROC      MPRF      µPRF        MAP
                  N. Bayes      0.95142    0.93284     0.80260   0.81992   0.49613   0.52670    0.64097
               N. Bayes + BN    0.95235    0.93386     0.80209   0.81974   0.50015   0.53029    0.64235
                  OR gate       0.75420    0.67806     0.92526   0.92163   0.25310   0.26268    0.72955
               OR gate + BN     0.84768    0.81891     0.92810   0.92739   0.31611   0.36036    0.72508


                                                     Initial results

       Problem in the OR gate! (Evaluation assumes
       dj ∈ Ci ⇔ p(ci |dj ) > 0.5). This is not, in general, true for the
       OR gate, need some scaling procedure (like SCut strategy).
                                 MACC       µACC        MROC      µROC      MPRF      µPRF        MAP
                 OR gate        0.92932    0.92612     0.92526   0.92163   0.45966   0.50407    0.72955
               OR gate + BN     0.96607    0.95588     0.92810   0.92739   0.51729   0.55116    0.72508


                               Scaled results (see paper for details).
Introduction      Our solution   The Bayesian network model   Results   Conclusions and future works




Conclusions

               The model is new, parametrizable (learning algorithm,
               parameters of algorithm, base classifier,...) and valuable
               by itself (always improves a baseline).
Introduction      Our solution   The Bayesian network model   Results   Conclusions and future works




Conclusions

               The model is new, parametrizable (learning algorithm,
               parameters of algorithm, base classifier,...) and valuable
               by itself (always improves a baseline).


               Using the Bayesian network over the OR gate provides a
               10% of improvement in some measures 
.
Introduction      Our solution   The Bayesian network model   Results   Conclusions and future works




Conclusions

               The model is new, parametrizable (learning algorithm,
               parameters of algorithm, base classifier,...) and valuable
               by itself (always improves a baseline).


               Using the Bayesian network over the OR gate provides a
               10% of improvement in some measures 
.


               Good results on ROC (ranked third).
Introduction      Our solution   The Bayesian network model   Results   Conclusions and future works




Conclusions

               The model is new, parametrizable (learning algorithm,
               parameters of algorithm, base classifier,...) and valuable
               by itself (always improves a baseline).


               Using the Bayesian network over the OR gate provides a
               10% of improvement in some measures 
.


               Good results on ROC (ranked third).


               Other base classifier? SVM with probabilistic outputs,
               Logistic Regression...
Introduction      Our solution   The Bayesian network model   Results   Conclusions and future works




Conclusions

               The model is new, parametrizable (learning algorithm,
               parameters of algorithm, base classifier,...) and valuable
               by itself (always improves a baseline).


               Using the Bayesian network over the OR gate provides a
               10% of improvement in some measures 
.


               Good results on ROC (ranked third).


               Other base classifier? SVM with probabilistic outputs,
               Logistic Regression...


               More experiments for the final version of the paper!
Introduction      Our solution   The Bayesian network model   Results   Conclusions and future works




               Thank you for your
               attention!
               Questions, comments, criticism?

               <SPAM>Expecting to defend my PhD by April 2010,
               searching for a PostDoc (in Europe) for 2010 on ML/IR
               related stuff. Any offers? < /SPAM>

Contenu connexe

Tendances

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Metric learning ICML2010 tutorial
Metric learning  ICML2010 tutorialMetric learning  ICML2010 tutorial
Metric learning ICML2010 tutorial
zukun
 

Tendances (20)

Comparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering modelsComparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering models
 
Jump-growth model for predator-prey dynamics
Jump-growth model for predator-prey dynamicsJump-growth model for predator-prey dynamics
Jump-growth model for predator-prey dynamics
 
Savage-Dickey paradox
Savage-Dickey paradoxSavage-Dickey paradox
Savage-Dickey paradox
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
 
BIRS 12w5105 meeting
BIRS 12w5105 meetingBIRS 12w5105 meeting
BIRS 12w5105 meeting
 
Logit stick-breaking priors for partially exchangeable count data
Logit stick-breaking priors for partially exchangeable count dataLogit stick-breaking priors for partially exchangeable count data
Logit stick-breaking priors for partially exchangeable count data
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
Iwsmbvs
IwsmbvsIwsmbvs
Iwsmbvs
 
Lecture2 xing
Lecture2 xingLecture2 xing
Lecture2 xing
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Metric learning ICML2010 tutorial
Metric learning  ICML2010 tutorialMetric learning  ICML2010 tutorial
Metric learning ICML2010 tutorial
 
Parallel Bayesian Optimization
Parallel Bayesian OptimizationParallel Bayesian Optimization
Parallel Bayesian Optimization
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]
 
Nu2422512255
Nu2422512255Nu2422512255
Nu2422512255
 
Montpellier Math Colloquium
Montpellier Math ColloquiumMontpellier Math Colloquium
Montpellier Math Colloquium
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
Omiros' talk on the Bernoulli factory problem
Omiros' talk on the  Bernoulli factory problemOmiros' talk on the  Bernoulli factory problem
Omiros' talk on the Bernoulli factory problem
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 

En vedette

The advantages of mobile learning
The advantages of mobile learningThe advantages of mobile learning
The advantages of mobile learning
Ohoooud
 

En vedette (19)

New wy tecc
New wy teccNew wy tecc
New wy tecc
 
Aprendizaje Supervisado con DauroLab
Aprendizaje Supervisado con DauroLabAprendizaje Supervisado con DauroLab
Aprendizaje Supervisado con DauroLab
 
Game Based Language Learning for Kids and Teens
Game Based Language Learning for Kids and TeensGame Based Language Learning for Kids and Teens
Game Based Language Learning for Kids and Teens
 
The advantages of mobile learning
The advantages of mobile learningThe advantages of mobile learning
The advantages of mobile learning
 
Mobile learning powerpoint
Mobile learning powerpointMobile learning powerpoint
Mobile learning powerpoint
 
Recursos Estilísticos
Recursos EstilísticosRecursos Estilísticos
Recursos Estilísticos
 
Capturing Users / Using social, engagement and mobile to drive acquisition an...
Capturing Users / Using social, engagement and mobile to drive acquisition an...Capturing Users / Using social, engagement and mobile to drive acquisition an...
Capturing Users / Using social, engagement and mobile to drive acquisition an...
 
Games & Gamification / Quo Vadis 2014
Games & Gamification / Quo Vadis 2014Games & Gamification / Quo Vadis 2014
Games & Gamification / Quo Vadis 2014
 
Mobile Learning - Done Right
Mobile Learning - Done RightMobile Learning - Done Right
Mobile Learning - Done Right
 
Smart Citizens - Populating Smart Cities / IoTShifts
Smart Citizens - Populating Smart Cities / IoTShiftsSmart Citizens - Populating Smart Cities / IoTShifts
Smart Citizens - Populating Smart Cities / IoTShifts
 
Employee Onboarding : Statistics you need to know
Employee Onboarding : Statistics you need to knowEmployee Onboarding : Statistics you need to know
Employee Onboarding : Statistics you need to know
 
Teaching Students with Emojis, Emoticons, & Textspeak
Teaching Students with Emojis, Emoticons, & TextspeakTeaching Students with Emojis, Emoticons, & Textspeak
Teaching Students with Emojis, Emoticons, & Textspeak
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging Challenges
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of Work
 
Build Features, Not Apps
Build Features, Not AppsBuild Features, Not Apps
Build Features, Not Apps
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 

Similaire à Link-based document classification using Bayesian Networks

GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016
Pablo Ginestet
 
分類器 (ナイーブベイズ)
分類器 (ナイーブベイズ)分類器 (ナイーブベイズ)
分類器 (ナイーブベイズ)
Satoshi MATSUURA
 
Inference in Bayesian Networks
Inference in Bayesian NetworksInference in Bayesian Networks
Inference in Bayesian Networks
guestfee8698
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
Selman Bozkır
 

Similaire à Link-based document classification using Bayesian Networks (20)

Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
 
Inex07
Inex07Inex07
Inex07
 
Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
Bayesnetwork
BayesnetworkBayesnetwork
Bayesnetwork
 
ML.pptx
ML.pptxML.pptx
ML.pptx
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016GonzalezGinestetResearchDay2016
GonzalezGinestetResearchDay2016
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Large sample property of the bayes factor in a spline semiparametric regressi...
Large sample property of the bayes factor in a spline semiparametric regressi...Large sample property of the bayes factor in a spline semiparametric regressi...
Large sample property of the bayes factor in a spline semiparametric regressi...
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
分類器 (ナイーブベイズ)
分類器 (ナイーブベイズ)分類器 (ナイーブベイズ)
分類器 (ナイーブベイズ)
 
Understanding distributed calculi in Haskell
Understanding distributed calculi in HaskellUnderstanding distributed calculi in Haskell
Understanding distributed calculi in Haskell
 
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet MetricsLocally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet Metrics
 
Volume computation and applications
Volume computation and applications Volume computation and applications
Volume computation and applications
 
Inference in Bayesian Networks
Inference in Bayesian NetworksInference in Bayesian Networks
Inference in Bayesian Networks
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
 
Computational tools for Bayesian model choice
Computational tools for Bayesian model choiceComputational tools for Bayesian model choice
Computational tools for Bayesian model choice
 

Dernier

Nayabad Call Girls ✔ 8005736733 ✔ Hot Model With Sexy Bhabi Ready For Sex At ...
Nayabad Call Girls ✔ 8005736733 ✔ Hot Model With Sexy Bhabi Ready For Sex At ...Nayabad Call Girls ✔ 8005736733 ✔ Hot Model With Sexy Bhabi Ready For Sex At ...
Nayabad Call Girls ✔ 8005736733 ✔ Hot Model With Sexy Bhabi Ready For Sex At ...
aamir
 
Call Girls Agency In Goa 💚 9316020077 💚 Call Girl Goa By Russian Call Girl ...
Call Girls  Agency In Goa  💚 9316020077 💚 Call Girl Goa By Russian Call Girl ...Call Girls  Agency In Goa  💚 9316020077 💚 Call Girl Goa By Russian Call Girl ...
Call Girls Agency In Goa 💚 9316020077 💚 Call Girl Goa By Russian Call Girl ...
russian goa call girl and escorts service
 
CHEAP Call Girls in Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in  Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in  Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Desi Bhabhi Call Girls In Goa 💃 730 02 72 001💃desi Bhabhi Escort Goa
Desi Bhabhi Call Girls  In Goa  💃 730 02 72 001💃desi Bhabhi Escort GoaDesi Bhabhi Call Girls  In Goa  💃 730 02 72 001💃desi Bhabhi Escort Goa
Desi Bhabhi Call Girls In Goa 💃 730 02 72 001💃desi Bhabhi Escort Goa
russian goa call girl and escorts service
 

Dernier (20)

Model Call Girls In Ariyalur WhatsApp Booking 7427069034 call girl service 24...
Model Call Girls In Ariyalur WhatsApp Booking 7427069034 call girl service 24...Model Call Girls In Ariyalur WhatsApp Booking 7427069034 call girl service 24...
Model Call Girls In Ariyalur WhatsApp Booking 7427069034 call girl service 24...
 
Behala ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Ready ...
Behala ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Ready ...Behala ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Ready ...
Behala ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Ready ...
 
2k Shot Call girls Laxmi Nagar Delhi 9205541914
2k Shot Call girls Laxmi Nagar Delhi 92055419142k Shot Call girls Laxmi Nagar Delhi 9205541914
2k Shot Call girls Laxmi Nagar Delhi 9205541914
 
Model Call Girls In Pazhavanthangal WhatsApp Booking 7427069034 call girl ser...
Model Call Girls In Pazhavanthangal WhatsApp Booking 7427069034 call girl ser...Model Call Girls In Pazhavanthangal WhatsApp Booking 7427069034 call girl ser...
Model Call Girls In Pazhavanthangal WhatsApp Booking 7427069034 call girl ser...
 
Dakshineswar Call Girls ✔ 8005736733 ✔ Hot Model With Sexy Bhabi Ready For Se...
Dakshineswar Call Girls ✔ 8005736733 ✔ Hot Model With Sexy Bhabi Ready For Se...Dakshineswar Call Girls ✔ 8005736733 ✔ Hot Model With Sexy Bhabi Ready For Se...
Dakshineswar Call Girls ✔ 8005736733 ✔ Hot Model With Sexy Bhabi Ready For Se...
 
Nayabad Call Girls ✔ 8005736733 ✔ Hot Model With Sexy Bhabi Ready For Sex At ...
Nayabad Call Girls ✔ 8005736733 ✔ Hot Model With Sexy Bhabi Ready For Sex At ...Nayabad Call Girls ✔ 8005736733 ✔ Hot Model With Sexy Bhabi Ready For Sex At ...
Nayabad Call Girls ✔ 8005736733 ✔ Hot Model With Sexy Bhabi Ready For Sex At ...
 
Call Girls Agency In Goa 💚 9316020077 💚 Call Girl Goa By Russian Call Girl ...
Call Girls  Agency In Goa  💚 9316020077 💚 Call Girl Goa By Russian Call Girl ...Call Girls  Agency In Goa  💚 9316020077 💚 Call Girl Goa By Russian Call Girl ...
Call Girls Agency In Goa 💚 9316020077 💚 Call Girl Goa By Russian Call Girl ...
 
Bhimtal ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Bhimtal ESCORT SERVICE❤CALL GIRL
Bhimtal ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Bhimtal ESCORT SERVICE❤CALL GIRLBhimtal ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Bhimtal ESCORT SERVICE❤CALL GIRL
Bhimtal ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Bhimtal ESCORT SERVICE❤CALL GIRL
 
CHEAP Call Girls in Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in  Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in  Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Top Rated Pune Call Girls Pimpri Chinchwad ⟟ 6297143586 ⟟ Call Me For Genuin...
Top Rated  Pune Call Girls Pimpri Chinchwad ⟟ 6297143586 ⟟ Call Me For Genuin...Top Rated  Pune Call Girls Pimpri Chinchwad ⟟ 6297143586 ⟟ Call Me For Genuin...
Top Rated Pune Call Girls Pimpri Chinchwad ⟟ 6297143586 ⟟ Call Me For Genuin...
 
↑Top Model (Kolkata) Call Girls Behala ⟟ 8250192130 ⟟ High Class Call Girl In...
↑Top Model (Kolkata) Call Girls Behala ⟟ 8250192130 ⟟ High Class Call Girl In...↑Top Model (Kolkata) Call Girls Behala ⟟ 8250192130 ⟟ High Class Call Girl In...
↑Top Model (Kolkata) Call Girls Behala ⟟ 8250192130 ⟟ High Class Call Girl In...
 
Hotel And Home Service Available Kolkata Call Girls South End Park ✔ 62971435...
Hotel And Home Service Available Kolkata Call Girls South End Park ✔ 62971435...Hotel And Home Service Available Kolkata Call Girls South End Park ✔ 62971435...
Hotel And Home Service Available Kolkata Call Girls South End Park ✔ 62971435...
 
Desi Bhabhi Call Girls In Goa 💃 730 02 72 001💃desi Bhabhi Escort Goa
Desi Bhabhi Call Girls  In Goa  💃 730 02 72 001💃desi Bhabhi Escort GoaDesi Bhabhi Call Girls  In Goa  💃 730 02 72 001💃desi Bhabhi Escort Goa
Desi Bhabhi Call Girls In Goa 💃 730 02 72 001💃desi Bhabhi Escort Goa
 
5* Hotels Call Girls In Goa {{07028418221}} Call Girls In North Goa Escort Se...
5* Hotels Call Girls In Goa {{07028418221}} Call Girls In North Goa Escort Se...5* Hotels Call Girls In Goa {{07028418221}} Call Girls In North Goa Escort Se...
5* Hotels Call Girls In Goa {{07028418221}} Call Girls In North Goa Escort Se...
 
📞 Contact Number 8617697112 VIP Ganderbal Call Girls
📞 Contact Number 8617697112 VIP Ganderbal Call Girls📞 Contact Number 8617697112 VIP Ganderbal Call Girls
📞 Contact Number 8617697112 VIP Ganderbal Call Girls
 
Book Sex Workers Available Kolkata Call Girls Service Airport Kolkata ✔ 62971...
Book Sex Workers Available Kolkata Call Girls Service Airport Kolkata ✔ 62971...Book Sex Workers Available Kolkata Call Girls Service Airport Kolkata ✔ 62971...
Book Sex Workers Available Kolkata Call Girls Service Airport Kolkata ✔ 62971...
 
VIP Model Call Girls Koregaon Park ( Pune ) Call ON 8005736733 Starting From ...
VIP Model Call Girls Koregaon Park ( Pune ) Call ON 8005736733 Starting From ...VIP Model Call Girls Koregaon Park ( Pune ) Call ON 8005736733 Starting From ...
VIP Model Call Girls Koregaon Park ( Pune ) Call ON 8005736733 Starting From ...
 
❤Personal Whatsapp Number Keylong Call Girls 8617697112 💦✅.
❤Personal Whatsapp Number Keylong Call Girls 8617697112 💦✅.❤Personal Whatsapp Number Keylong Call Girls 8617697112 💦✅.
❤Personal Whatsapp Number Keylong Call Girls 8617697112 💦✅.
 
Hotel And Home Service Available Kolkata Call Girls Howrah ✔ 6297143586 ✔Call...
Hotel And Home Service Available Kolkata Call Girls Howrah ✔ 6297143586 ✔Call...Hotel And Home Service Available Kolkata Call Girls Howrah ✔ 6297143586 ✔Call...
Hotel And Home Service Available Kolkata Call Girls Howrah ✔ 6297143586 ✔Call...
 
𓀤Call On 6297143586 𓀤 Ultadanga Call Girls In All Kolkata 24/7 Provide Call W...
𓀤Call On 6297143586 𓀤 Ultadanga Call Girls In All Kolkata 24/7 Provide Call W...𓀤Call On 6297143586 𓀤 Ultadanga Call Girls In All Kolkata 24/7 Provide Call W...
𓀤Call On 6297143586 𓀤 Ultadanga Call Girls In All Kolkata 24/7 Provide Call W...
 

Link-based document classification using Bayesian Networks

  • 1. Introduction Our solution The Bayesian network model Results Conclusions and future works Link-based text classification using Bayesian networks Luis M. de Campos Juan M. Fernández-Luna Juan F. Huete Andrés R. Masegosa Alfonso E. Romero {lci,jmfluna,jhg,andrew,aeromero}@decsai.ugr.es Departamento de Ciencias de la Computación e Inteligencia Artificial E.T.S.I. Informática y de Telecomunicación, CITIC-UGR, Universidad de Granada 18071 – Granada, Spain INEX 2009 Workshop, Brisbane
  • 2. Introduction Our solution The Bayesian network model Results Conclusions and future works Our participation Universidad de Granada at INEX 2009 The third year we participate on XML mining (classification). As previous ocasions, we are interested in Bayesian networks. We’ve provided a new solution to this problem. Sorry, no AdHoc this year .
  • 3. Introduction Our solution The Bayesian network model Results Conclusions and future works Our participation The problem itself A text (XML) categorization problem. Training/test corpus. Multilabel (more than 1 category per doc). Links among files (training, test) given in a matrix. Vectors of indexed terms (normalized tf-idf) provided.
  • 4. Introduction Our solution The Bayesian network model Results Conclusions and future works Our participation The problem itself A text (XML) categorization problem. Training/test corpus. Same as previous years Multilabel (more than 1 category per doc). New this year! Links among files (training, test) given in a matrix. Same as 2008 Vectors of indexed terms (normalized tf-idf) provided. The eternal question, what about XML?
  • 5. Introduction Our solution The Bayesian network model Results Conclusions and future works Our solution (2008) Encyclopedia regularity (a document of category Ci tends to links documents on the same category). Graphically verified on the training set. In 2008 we combined a flat-text classifier (Naïve Bayes) with a Bayesian network of fixed structure which modelled interaction among categories, using learnt probabilities P(ci |cj ). Results were discrete (the worst model among 3, and improvements over our baseline were not significant).
  • 6. Introduction Our solution The Bayesian network model Results Conclusions and future works Our starting point (2009) We detected the same regularity on categories (no matrix plot this year). Possible (hidden) hierarchy (for example Portal:Religion, Portal:Christianity and Portal:Catholicism). This year we learn the interactions among categories from data, no fixed structure, but any which is on the set of categories.
  • 7. Introduction Our solution The Bayesian network model Results Conclusions and future works Modeling link structure Modeling link structure I We assume there is a global probability distribution among all these variables, and we will model it with a Bayesian network. Variables: categories Ci (39), categories of incoming links Ej (39) and terms Tk (many). Main Assumption: the probability distributions of a document and the categories of files that link it are independent given the category. Or simbolically: p(dj , ej |ci ) = p(dj |ci ) p(ej |ci ).
  • 8. Introduction Our solution The Bayesian network model Results Conclusions and future works Modeling link structure We then search for the conditional probability p(ci |dj , ej ): p(dj , ej |ci ) p(ci ) p(dj |ci ) p(ej |ci ) p(ci ) p(ci |dj , ej ) = = p(dj , ej ) p(dj , ej ) p(ci |dj ) p(dj ) p(ej |ci ) p(ci ) = p(ci ) p(dj , ej ) p(ci |dj ) p(dj ) p(ci |ej ) p(ej ) = p(ci ) p(dj , ej ) p(dj ) p(ej ) p(ci |dj ) p(ci |ej ) = . p(dj , ej ) p(ci )
  • 9. Introduction Our solution The Bayesian network model Results Conclusions and future works Modeling link structure We then search for the conditional probability p(ci |dj , ej ): p(dj , ej |ci ) p(ci ) p(dj |ci ) p(ej |ci ) p(ci ) p(ci |dj , ej ) = = p(dj , ej ) p(dj , ej ) p(ci |dj ) p(dj ) p(ej |ci ) p(ci ) = p(ci ) p(dj , ej ) p(ci |dj ) p(dj ) p(ci |ej ) p(ej ) = p(ci ) p(dj , ej ) p(dj ) p(ej ) p(ci |dj ) p(ci |ej ) = . p(dj , ej ) p(ci ) p(ci |dj ) p(ci |ej ) p(ci |dj , ej ) ∝ p(ci )
  • 10. Introduction Our solution The Bayesian network model Results Conclusions and future works Modeling link structure We then search for the conditional probability p(ci |dj , ej ): p(dj , ej |ci ) p(ci ) p(dj |ci ) p(ej |ci ) p(ci ) p(ci |dj , ej ) = = p(dj , ej ) p(dj , ej ) p(ci |dj ) p(dj ) p(ej |ci ) p(ci ) = p(ci ) p(dj , ej ) p(ci |dj ) p(dj ) p(ci |ej ) p(ej ) = p(ci ) p(dj , ej ) p(dj ) p(ej ) p(ci |dj ) p(ci |ej ) = . p(dj , ej ) p(ci ) p(ci |dj ) p(ci |ej ) p(ci |dj , ej ) ∝ p(ci ) p(ci |dj ) p(ci |ej ) / p(ci ) p(ci |dj , ej ) = p(ci |dj )p(ci |ej )/p(ci ) + p(c i |dj )p(c i |ej )/p(c i )
  • 11. Introduction Our solution The Bayesian network model Results Conclusions and future works Modeling link structure Modeling link structure III p(ci |dj ): output of a probabilistic classifier. Any probabilistic classifier. p(ci |ej ): probability of being of Ci considering the set of the categories of the incoming (known) links. This is modeled by the Bayesian network. The problem reduces to the following: [see next slide]
  • 12. Introduction Our solution The Bayesian network model Results Conclusions and future works Modeling link structure Modeling link structure IV We have a vector of 39+39 binary variables for each document: 39 for each category (1 if the doc. is of that category, 0 if not), and 39 more (1 if the document is linked by documents of this category, 0 if not). With a learning algorithm, we learn a Bayesian network from that data. For each document to classify, for each category Ci we compute its content probability p(ci |dj ) (with base classifier), and the probability of being of Ci knowing the categories of certain neighbours p(ci |ej ) (with the learnt Bayesian network). We combine them using the blue equation.
  • 13. Introduction Our solution The Bayesian network model Results Conclusions and future works Learning link structure Learning Bayesian Network, using WEKA package.
  • 14. Introduction Our solution The Bayesian network model Results Conclusions and future works Learning link structure Learning Bayesian Network, using WEKA package. Hillclimbing algorithm (easy and fast). BDeu metric. Three parents max. per node.
  • 15. Introduction Our solution The Bayesian network model Results Conclusions and future works Learning link structure Learning Bayesian Network, using WEKA package. Hillclimbing algorithm (easy and fast). BDeu metric. Three parents max. per node. Propagation, using Elvira (WEKA does not have propagation algorithms).
  • 16. Introduction Our solution The Bayesian network model Results Conclusions and future works Learning link structure Learning Bayesian Network, using WEKA package. Hillclimbing algorithm (easy and fast). BDeu metric. Three parents max. per node. Propagation, using Elvira (WEKA does not have propagation algorithms). Compute p(ci ) (once), and p(ci |ej ) (for each document j). Exact propagation was slow ! Importance Sampling algorithm (approximate).
  • 17. Introduction Our solution The Bayesian network model Results Conclusions and future works Base classifiers Base classifiers We have used Multinomial Naïve Bayes (binary) and Bayesian OR Gate (a model presented by our group in INEX 2007). They are extensive described on the paper (read it if you want to learn deeply about these two classifiers). Any other probabilistic classifiers can be used to firstly obtain p(ci |dj ) (any suggestions or preferences?).
  • 18. Introduction Our solution The Bayesian network model Results Conclusions and future works Results MACC µACC MROC µROC MPRF µPRF MAP N. Bayes 0.95142 0.93284 0.80260 0.81992 0.49613 0.52670 0.64097 N. Bayes + BN 0.95235 0.93386 0.80209 0.81974 0.50015 0.53029 0.64235 OR gate 0.75420 0.67806 0.92526 0.92163 0.25310 0.26268 0.72955 OR gate + BN 0.84768 0.81891 0.92810 0.92739 0.31611 0.36036 0.72508 Initial results Problem in the OR gate! (Evaluation assumes dj ∈ Ci ⇔ p(ci |dj ) > 0.5). This is not, in general, true for the OR gate, need some scaling procedure (like SCut strategy).
  • 19. Introduction Our solution The Bayesian network model Results Conclusions and future works Results MACC µACC MROC µROC MPRF µPRF MAP N. Bayes 0.95142 0.93284 0.80260 0.81992 0.49613 0.52670 0.64097 N. Bayes + BN 0.95235 0.93386 0.80209 0.81974 0.50015 0.53029 0.64235 OR gate 0.75420 0.67806 0.92526 0.92163 0.25310 0.26268 0.72955 OR gate + BN 0.84768 0.81891 0.92810 0.92739 0.31611 0.36036 0.72508 Initial results Problem in the OR gate! (Evaluation assumes dj ∈ Ci ⇔ p(ci |dj ) > 0.5). This is not, in general, true for the OR gate, need some scaling procedure (like SCut strategy). MACC µACC MROC µROC MPRF µPRF MAP OR gate 0.92932 0.92612 0.92526 0.92163 0.45966 0.50407 0.72955 OR gate + BN 0.96607 0.95588 0.92810 0.92739 0.51729 0.55116 0.72508 Scaled results (see paper for details).
  • 20. Introduction Our solution The Bayesian network model Results Conclusions and future works Conclusions The model is new, parametrizable (learning algorithm, parameters of algorithm, base classifier,...) and valuable by itself (always improves a baseline).
  • 21. Introduction Our solution The Bayesian network model Results Conclusions and future works Conclusions The model is new, parametrizable (learning algorithm, parameters of algorithm, base classifier,...) and valuable by itself (always improves a baseline). Using the Bayesian network over the OR gate provides a 10% of improvement in some measures .
  • 22. Introduction Our solution The Bayesian network model Results Conclusions and future works Conclusions The model is new, parametrizable (learning algorithm, parameters of algorithm, base classifier,...) and valuable by itself (always improves a baseline). Using the Bayesian network over the OR gate provides a 10% of improvement in some measures . Good results on ROC (ranked third).
  • 23. Introduction Our solution The Bayesian network model Results Conclusions and future works Conclusions The model is new, parametrizable (learning algorithm, parameters of algorithm, base classifier,...) and valuable by itself (always improves a baseline). Using the Bayesian network over the OR gate provides a 10% of improvement in some measures . Good results on ROC (ranked third). Other base classifier? SVM with probabilistic outputs, Logistic Regression...
  • 24. Introduction Our solution The Bayesian network model Results Conclusions and future works Conclusions The model is new, parametrizable (learning algorithm, parameters of algorithm, base classifier,...) and valuable by itself (always improves a baseline). Using the Bayesian network over the OR gate provides a 10% of improvement in some measures . Good results on ROC (ranked third). Other base classifier? SVM with probabilistic outputs, Logistic Regression... More experiments for the final version of the paper!
  • 25. Introduction Our solution The Bayesian network model Results Conclusions and future works Thank you for your attention! Questions, comments, criticism? <SPAM>Expecting to defend my PhD by April 2010, searching for a PostDoc (in Europe) for 2010 on ML/IR related stuff. Any offers? < /SPAM>