Lifelong Topic Modelling presentation

•

1 j'aime•726 vues

Paper presentation for the final course Advanced Concept in Machine Learning. The paper is @Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data" http://jmlr.org/proceedings/papers/v32/chenf14.pdf

Technologie

Lifelong Topic Modelling
Paper Review Presentation
Daniele Di Mitri
Department of Knowledge Engineering
University of Maastricht
22th May 2015
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 1 / 13

Chosen paper
Chen, Zhiyuan, and Bing Liu.
Topic Modeling using Topics from Many Domains, Lifelong Learning
and Big Data.
Proceedings of the 31st ICML conference, 2014
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 2 / 13

Outline
1 Topic modelling
LDA description
LDA limitations
2 Topic modelling using knowledge
Knowledge Based Topic modelling
3 Lifelong Topic modelling
Lifelong learning approach
The proposed algorithm
Incorporation of knowledge
4 Evaluation
5 Summary
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 3 / 13

Latent Dirichlet Allocation
some useful backgroundLatent Dirichlet allocation (LDA)
gene 0.04
dna 0.02
genetic 0.01
.,,
life 0.02
evolve 0.01
organism 0.01
.,,
brain 0.04
neuron 0.02
nerve 0.01
...
data 0.02
number 0.02
computer 0.01
.,,
Topics Documents
Topic proportions and
assignments
• Each topic is a distribution over words
• Each document is a mixture of corpus-wide topics
• Each word is drawn from one of those topics
Figure: David Blei, Probabilistic Topic Models, 2012
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 4 / 13

LDA limitations
Unsupervised model can produce incoherent topics
Example
LDA sample topics
D1 = {price, color, cost, life}
D2 = {cost, picture, price, expensive}
D3 = {price, money, customer, expensive}
These topics have incoherent words: color, life, picture, customer
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 5 / 13

Can we use Knowledge?
some related works
SUPERVISED
Topic model in supervised settings
E.g. Blei & McAuliﬀe (2007)
All prior knowledge is correct
Uses ”regions” and ”labels”
UNSUPERVISED
Knowledge Based Topic Modelling
E.g. GK-LDA (Chen et al. 2013) and DF-LDA (Andrezejewski et al.
2009)
Typically assume that given knowledge is correct
They don’t extract automatically and target prior knowledge
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 6 / 13

Can we do better?
A fully automatic system to mine prior knowledge and deal with inconsistencies
INTUITION
If we ﬁnd a set or words common in two domains these can serve as
prior knowledge
Example
D1 ∩ D2 = {price, cost}
D2 ∩ D3 = {price, expensive}
These are prior knowledge sets (pk-sets)
Example (D1 improved)
D1 = {price, cost, expensive, color}
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 7 / 13

Lifelong Learning approach
In 4 ”simple” steps
1 Given a set of domains D = {D1, .., Dn} it runs simple LDA(Di ) to
generate prior topics p-topics, unionised in S
2 Given a test domain Dt, run LTM(Dt) to generate c-topics At
3 For each aj ∈ At ﬁnd matching topics Mt
j ∈ S (high level knowledge
for aj )
4 Mine Mt
j to generate pk-sets of length 2
Why Lifelong Learning? Retaining the learnt knowledge with LTM and
adding (replacing) it to our initial prior topics S.
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 8 / 13

LTM algorithm
1 Runs GibbsSampling(Dt, ∅) (equivalent to LDA), for N iterations
2 Runs GibbsSampling(Dt, Kt) for N iterations adding Kt
3 Kt is updated at each iteration using minimum Symmetrised
KL-divergence sk ∈ S and aj ∈ At, and the Frequent Itemset Mining
to generate frequent itemsets of length 2 (pk-sets)
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 9 / 13

How does LTM incorporate knowledge?
NB: d is added not by 1, but to a certain proportion, which stored in a
matrix and is determined by using Pointwise Mutual Information.
PMI(w1, w2) = log(P(w1, w2)/P(w1)P(w2))
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 10 / 13

Evaluation
Test against 4 other baseline algorithms: LDA,DF-LDA, GK-LDA
and AKL
Average Topic Coherence as quality measure
Figure: Results of tests in settings 1 & 2
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 11 / 13

In summary
Lifelong Topic Modelling
Learn prior knowledge
Fault tolerance
First Lifelong Learning Topic model
Big Data ready
However...
some points for improvement
Text-corpora to be diversiﬁed (only Amazon review)
Focus on the ﬂow of the algorithm
2nd test setting and test with Big Data not fully reported
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 12 / 13

Thank you!
Q&A
Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 13 / 13

Contenu connexe

Tendances

Canini09aAjay Ohri

TopicmodelsAjay Ohri

TopicModels_BleiPaper_Summary.pptxKalpit Desai

A Simple Introduction to Neural Information RetrievalBhaskar Mitra

Topic modelsAjay Ohri

Blei ngjordan2003Ajay Ohri

Topic model, LDA and all thatZhibo Xiao

Deep Learning for SearchBhaskar Mitra

Neural Models for Information RetrievalBhaskar Mitra

The Duet modelBhaskar Mitra

Introduction to Probabilistic Latent Semantic AnalysisNYC Predictive Analytics

Topic ModelsClaudia Wagner

Neural Models for Information RetrievalBhaskar Mitra

graduate_thesis (1)Sihan Chen

Transformation Functions for Text Classification: A case study with StackOver...Sebastian Ruder

Duet @ TREC 2019 Deep Learning TrackBhaskar Mitra

5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra

Neural Models for Document RankingBhaskar Mitra

Blei lafferty2009Ajay Ohri

Topic modeling using big data analyticsFarheen Nilofer

Tendances (20)

Canini09a

Topicmodels

TopicModels_BleiPaper_Summary.pptx

A Simple Introduction to Neural Information Retrieval

Topic models

Blei ngjordan2003

Topic model, LDA and all that

Deep Learning for Search

Neural Models for Information Retrieval

The Duet model

Introduction to Probabilistic Latent Semantic Analysis

Topic Models

Neural Models for Information Retrieval

graduate_thesis (1)

Transformation Functions for Text Classification: A case study with StackOver...

Duet @ TREC 2019 Deep Learning Track

5 Lessons Learned from Designing Neural Models for Information Retrieval

Neural Models for Document Ranking

Blei lafferty2009

Topic modeling using big data analytics

En vedette

Obessu views on school resourceDaniele Di Mitri

Battlecode2014 - final presentation - group n.2Daniele Di Mitri

Claim Your Voice - Presentation of the results of the campign for VET studentsDaniele Di Mitri

SocialLda saurabh_IIITH

Rob Nelson - Ideology and algorithms: the uses of nationalism in the American...Digital History

Inclusive & open learning analyticsDaniele Di Mitri

Digital Learning Projection - poster for #LAK17Daniele Di Mitri

StreamGrid: Summarization of large-scale Events using Topic Modeling and Temp...Symeon Papadopoulos

Academic writing in LaTeX Daniele Di Mitri

Topic Modelling: Tutorial on Usage and ApplicationsAyush Jain

Research project MAI2 - Final Presentation Group 4Daniele Di Mitri

Fabrikatyr lda topic modelling practical applicationTim Carnus

Digital Learning Projection - Learning state estimation from multimodal learn...Daniele Di Mitri

Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...Alexis Perrier

Topic Modelling to identify behavioral trends in online communities Conor Duke

Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016Jonathan Sedar

Visual Learning PulseDaniele Di Mitri

Avito recsys-challenge-2016RecSys Challenge 2016: Job Recommendation Based on...Vasily Leksin

Word2Vec: Vector presentation of words - Mohammad Mahdaviirpycon

An Introduction to gensim: "Topic Modelling for Humans"sandinmyjoints

En vedette (20)

Obessu views on school resource

Battlecode2014 - final presentation - group n.2

Claim Your Voice - Presentation of the results of the campign for VET students

SocialLda

Rob Nelson - Ideology and algorithms: the uses of nationalism in the American...

Inclusive & open learning analytics

Digital Learning Projection - poster for #LAK17

StreamGrid: Summarization of large-scale Events using Topic Modeling and Temp...

Academic writing in LaTeX

Topic Modelling: Tutorial on Usage and Applications

Research project MAI2 - Final Presentation Group 4

Fabrikatyr lda topic modelling practical application

Digital Learning Projection - Learning state estimation from multimodal learn...

Topic modeling of Twitter followers - Paris Machine Learning meetup - Alex Pe...

Topic Modelling to identify behavioral trends in online communities

Topic Modelling on the Enron Email Corpus @ ODSC 13 Apr 2016

Visual Learning Pulse

Avito recsys-challenge-2016RecSys Challenge 2016: Job Recommendation Based on...

Word2Vec: Vector presentation of words - Mohammad Mahdavi

An Introduction to gensim: "Topic Modelling for Humans"

Similaire à Lifelong Topic Modelling presentation

Semantic Annotation of Documentssubash chandra

A Text Mining Research Based on LDA Topic Modellingcsandit

A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGcscpconf

Knowledge Discovery in Remote Access Databases Zakaria Zubi

Determining the Credibility of Science CommunicationIsabelle Augenstein

Sparse Composite Document Vector (Emnlp 2017)Vivek Gupta

Programming learning: a hierarchical model based diagnosis approachWellington Pinheiro

Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...IJECEIAES

An introduction to JuliaJiahao Chen

Data Mining OPtimization Ontology and its application to meta-mining of knowl...Agnieszka Ławrynowicz

집합모델 확장불린모델guesta34d441

집합모델 확장불린모델JUNGEUN KANG

[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...Hiroki Shimanaka

AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult

Associate Professor David Levy (DCL)butest

Lec1Prafulla Kiran

CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal

A Document Exploring System on LDA Topic Model for Wikipedia Articlesijma

Demystifying Ml, DL and AIGreg Werner

LDA on social bookmarking systemsDenis Parra Santander

Similaire à Lifelong Topic Modelling presentation (20)

Semantic Annotation of Documents

A Text Mining Research Based on LDA Topic Modelling

A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING

Knowledge Discovery in Remote Access Databases

Determining the Credibility of Science Communication

Sparse Composite Document Vector (Emnlp 2017)

Programming learning: a hierarchical model based diagnosis approach

Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...

An introduction to Julia

Data Mining OPtimization Ontology and its application to meta-mining of knowl...

집합모델 확장불린모델

[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...

AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...

Associate Professor David Levy (DCL)

Lec1

CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS

A Document Exploring System on LDA Topic Model for Wikipedia Articles

Demystifying Ml, DL and AI

LDA on social bookmarking systems

Plus de Daniele Di Mitri

SenseTheClassroom Live at EC-TEL 2022Daniele Di Mitri

Guest Lecture: Restoring Context in Distance Learning with Artificial Intelli...Daniele Di Mitri

SITE Interactive kenyote 2021Daniele Di Mitri

MOBIUS: Smart Mobility Tracking with Smartphone SensorsDaniele Di Mitri

The Multimodal Tutor - Presentation PhD defenceDaniele Di Mitri

Real-time Multimodal Feedback with the CPR TutorDaniele Di Mitri

Multimodal Tutor for CPR presented at AIME'19Daniele Di Mitri

The Multimodal Learning Analytics PipelineDaniele Di Mitri

Workshop: Multimodal TutorDaniele Di Mitri

Read Between The Lines: an Annotation Tool for Multimodal DataDaniele Di Mitri

The Multimodal Tutor - short pitch presentation at JTELSS 2018 in Durrës, Alb...Daniele Di Mitri

Sensors for Learning workshopDaniele Di Mitri

Multimodal Machines #JTELSS17 workshopDaniele Di Mitri

Multimodal Tutor - Adaptive feedback from multimodal experience capturingDaniele Di Mitri

Learning Pulse - paper presentation at LAK17Daniele Di Mitri

Visual Learning Pulse - Final Thesis presentationDaniele Di Mitri

Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Daniele Di Mitri

(IT) Slides della presentazione della tesi di LaureaDaniele Di Mitri

Obessu’s inputs on «opening up education»Daniele Di Mitri

European politicaldebates presentationDaniele Di Mitri

Plus de Daniele Di Mitri (20)

SenseTheClassroom Live at EC-TEL 2022

Guest Lecture: Restoring Context in Distance Learning with Artificial Intelli...

SITE Interactive kenyote 2021

MOBIUS: Smart Mobility Tracking with Smartphone Sensors

The Multimodal Tutor - Presentation PhD defence

Real-time Multimodal Feedback with the CPR Tutor

Multimodal Tutor for CPR presented at AIME'19

The Multimodal Learning Analytics Pipeline

Workshop: Multimodal Tutor

Read Between The Lines: an Annotation Tool for Multimodal Data

The Multimodal Tutor - short pitch presentation at JTELSS 2018 in Durrës, Alb...

Sensors for Learning workshop

Multimodal Machines #JTELSS17 workshop

Multimodal Tutor - Adaptive feedback from multimodal experience capturing

Learning Pulse - paper presentation at LAK17

Visual Learning Pulse - Final Thesis presentation

Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...

(IT) Slides della presentazione della tesi di Laurea

Obessu’s inputs on «opening up education»

European politicaldebates presentation

Dernier

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

How to convert PDF to text with Nanonetsnaman860154

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Histor y of HAM Radio presentation slidevu2urc

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

A Call to Action for Generative AI in 2024Results

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Slack Application Development 101 Slidespraypatel2

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Dernier (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames

How to convert PDF to text with Nanonets

🐬 The future of MySQL is Postgres 🐘

Histor y of HAM Radio presentation slide

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Understanding the Laravel MVC Architecture

Breaking the Kubernetes Kill Chain: Host Path Mount

A Call to Action for Generative AI in 2024

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Data Cloud, More than a CDP by Matt Robison

Finology Group – Insurtech Innovation Award 2024

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

Slack Application Development 101 Slides

My Hashitalk Indonesia April 2024 Presentation

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Lifelong Topic Modelling presentation

1. Lifelong Topic Modelling Paper Review Presentation Daniele Di Mitri Department of Knowledge Engineering University of Maastricht 22th May 2015 Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 1 / 13

2. Chosen paper Chen, Zhiyuan, and Bing Liu. Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data. Proceedings of the 31st ICML conference, 2014 Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 2 / 13

3. Outline 1 Topic modelling LDA description LDA limitations 2 Topic modelling using knowledge Knowledge Based Topic modelling 3 Lifelong Topic modelling Lifelong learning approach The proposed algorithm Incorporation of knowledge 4 Evaluation 5 Summary Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 3 / 13

4. Latent Dirichlet Allocation some useful backgroundLatent Dirichlet allocation (LDA) gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,, Topics Documents Topic proportions and assignments • Each topic is a distribution over words • Each document is a mixture of corpus-wide topics • Each word is drawn from one of those topics Figure: David Blei, Probabilistic Topic Models, 2012 Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 4 / 13

5. LDA limitations Unsupervised model can produce incoherent topics Example LDA sample topics D1 = {price, color, cost, life} D2 = {cost, picture, price, expensive} D3 = {price, money, customer, expensive} These topics have incoherent words: color, life, picture, customer Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 5 / 13

6. Can we use Knowledge? some related works SUPERVISED Topic model in supervised settings E.g. Blei & McAuliﬀe (2007) All prior knowledge is correct Uses ”regions” and ”labels” UNSUPERVISED Knowledge Based Topic Modelling E.g. GK-LDA (Chen et al. 2013) and DF-LDA (Andrezejewski et al. 2009) Typically assume that given knowledge is correct They don’t extract automatically and target prior knowledge Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 6 / 13

7. Can we do better? A fully automatic system to mine prior knowledge and deal with inconsistencies INTUITION If we ﬁnd a set or words common in two domains these can serve as prior knowledge Example D1 ∩ D2 = {price, cost} D2 ∩ D3 = {price, expensive} These are prior knowledge sets (pk-sets) Example (D1 improved) D1 = {price, cost, expensive, color} Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 7 / 13

8. Lifelong Learning approach In 4 ”simple” steps 1 Given a set of domains D = {D1, .., Dn} it runs simple LDA(Di ) to generate prior topics p-topics, unionised in S 2 Given a test domain Dt, run LTM(Dt) to generate c-topics At 3 For each aj ∈ At ﬁnd matching topics Mt j ∈ S (high level knowledge for aj ) 4 Mine Mt j to generate pk-sets of length 2 Why Lifelong Learning? Retaining the learnt knowledge with LTM and adding (replacing) it to our initial prior topics S. Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 8 / 13

9. LTM algorithm 1 Runs GibbsSampling(Dt, ∅) (equivalent to LDA), for N iterations 2 Runs GibbsSampling(Dt, Kt) for N iterations adding Kt 3 Kt is updated at each iteration using minimum Symmetrised KL-divergence sk ∈ S and aj ∈ At, and the Frequent Itemset Mining to generate frequent itemsets of length 2 (pk-sets) Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 9 / 13

10. How does LTM incorporate knowledge? NB: d is added not by 1, but to a certain proportion, which stored in a matrix and is determined by using Pointwise Mutual Information. PMI(w1, w2) = log(P(w1, w2)/P(w1)P(w2)) Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 10 / 13

11. Evaluation Test against 4 other baseline algorithms: LDA,DF-LDA, GK-LDA and AKL Average Topic Coherence as quality measure Figure: Results of tests in settings 1 & 2 Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 11 / 13

12. In summary Lifelong Topic Modelling Learn prior knowledge Fault tolerance First Lifelong Learning Topic model Big Data ready However... some points for improvement Text-corpora to be diversiﬁed (only Amazon review) Focus on the ﬂow of the algorithm 2nd test setting and test with Big Data not fully reported Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 12 / 13

13. Thank you! Q&A Daniele Di Mitri (DKE) Lifelong Topic Modelling 22th May 2015 13 / 13

Lifelong Topic Modelling presentation

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Lifelong Topic Modelling presentation

Similaire à Lifelong Topic Modelling presentation (20)

Plus de Daniele Di Mitri

Plus de Daniele Di Mitri (20)

Dernier

Dernier (20)

Lifelong Topic Modelling presentation