[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametric Topic Model for Labeled Data

•

5 j'aime•87,889 vues

Shuyo Nakatani

Supervised Nonparametric Topic Model

Technologie Business

[Kim+ ICML2012] Dirichlet Process
with Mixed Random Measures : A
Nonparametric Topic Model for
Labeled Data

2012/07/28
Nakatani Shuyo @ Cybozu Labs, Inc
twitter : @shuyo

LDA(Latent Dirichlet Allocation)
[Blei+ 03]
• Unsupervised Topic Model
– Each word has an unobserved topic
• Parametric
– The topic size K is given in advance

via Wikipedia

Labeled LDA [Ramage+ 09]

• Supervised Topic Model
– Each document has an observed label
• Parametric

via [Ramage+ 09]

Generative Process for L-LDA
• 𝜷 𝑘 ~Dir 𝜼
topics corresponding to
𝑑 observed labels
• Λ 𝑘 ~Bernoulli Φ 𝑘
• 𝜽 𝑑 ~Dir 𝜶 𝑑
restricted to labeled
– where 𝜶 𝑑 = 𝛼𝑘 parameters
𝑑
𝑘 Λ 𝑘 =1

𝑑 𝑑
• 𝑧 𝑖 ~Multi 𝜽
𝑑
• 𝑤𝑖 ~Multi 𝜷 𝑧 𝑑
𝑖

via [Ramage+ 09]

Pros/Cons of L-LDA
• Pros
– Easy to implement

• Cons via [Ramage+ 09]

– It is necessary to specify label-topic
correspondence manually
• Its performance depends on the corresponds

※) My implementation is here : https://github.com/shuyo/iir/blob/master/lda/llda.py

DP-MRM [Kim+ 12]
– Dirichlet Process with Mixed Random Measures

• Supervised Topic Model
• Nonparametric
– K is not the topic size, but the label size
𝛼

𝑁𝑗

𝐻 𝐺0𝑘 𝐺𝑗 𝜃 𝑗𝑖 𝑥 𝑗𝑖

𝜆j 𝑟𝑗 𝐷
𝛽 𝛾𝑘 𝜂
𝐾

Generative Process for DP-MRM
𝛼
Each label has a random
measure as topic space 𝑁𝑗
𝐻 𝐺0𝑘 𝐺𝑗 𝜃 𝑗𝑖 𝑥 𝑗𝑖
• 𝐻 = Dir 𝛽
𝜆j 𝑟𝑗 𝐷
• 𝐺0𝑘 ~DP 𝛾 𝑘 , 𝐻 𝛽
𝐾
𝛾𝑘 𝜂

• 𝜆 𝑗 ~Dir 𝒓 𝑗 𝜂 where 𝒓 𝑗 = 𝐼 𝑘∈label 𝑗

• 𝐺 𝑗 ~DP 𝛼, 𝑘∈label 𝑗 𝜆 𝑗𝑘 𝐺0𝑘 mixed random measures

• 𝜃 𝑗𝑖 ~𝐺 𝑗 , 𝑥 𝑗𝑖 ~𝐹 𝜃 𝑗𝑖 = Multi 𝜃 𝑗𝑖

Stick Breaking Process
• 𝑣 𝑙 𝑘 ~Beta 1, 𝛾 𝑘 , 𝜋 𝑙𝑘 = 𝑣 𝑙 𝑘 𝑙−1
𝑑=0 1 − 𝑣 𝑑𝑘

• 𝜙 𝑙𝑘 ~𝐻, 𝐺0𝑘 = ∞
𝑙=0 𝜋 𝑙𝑘 𝛿 𝜙 𝑘
𝑙
𝑡−1
• 𝜆 𝑗 ~Dir 𝒓 𝑗 𝜂 , 𝑤 𝑗𝑡 ~Beta 1, 𝛼 , 𝜋 𝑗𝑡 = 𝑤 𝑗𝑡 𝑑=0 1 − 𝑤 𝑗𝑑
𝑘 𝑗𝑡 ∞
• 𝑘 𝑗𝑡 ~Multi 𝜆 𝑗 , 𝜓 𝑗𝑡 ~𝐺0 , 𝐺𝑗 = 𝑡=0 𝜋 𝑗𝑡 𝛿 𝜓 𝑗𝑡

Chinese Restaurant Franchise
• 𝑡 𝑗𝑖 : table index of 𝑖-th term in 𝑗-th document
• 𝑘 𝑗𝑡 , 𝑙 𝑗𝑡 : dish indexes on 𝑡-th table of 𝑗-th
document This layer consists on
only a single DP G0
on normal HDP

Experiments
• DP-MRM gives label-topic probabilistic
corresponding automatically.

via [Kim+ 12]

via [Kim+ 12]

• L-LDA can also predict single labeled document to
assign a common second label to any documents.

References
• [Kim+ ICML2012] Dirichlet Process with Mixed
Random Measures : A Nonparametric Topic
Model for Labeled Data
• [Ramage+ EMNLP2009] Labeled LDA : A
supervised topic model for credit attribution in
multi-labeled corpora
• [Blei+ 2003] Latent Dirichlet Allocation

Recommandé

Csr2011 june17 15_15_kaminskiCSR2011

MorphoRuEval-2017. Part-of-Speech Tagging: The Power of the Linear SVM-based ...Anton Kazennikov

Classification of Arabic Texts using Four ClassifiersIJCSIS Research Publications

Are You Ready For Clean Code?Whisnu Sucitanuary

CodeJugalbandi-Expression-Problem-HealthyCode-Magazine#Jan-2015-IssueDhaval Dalal

Guidance, Please! Towards a Framework for RDF-based Constraint Languages.Kai Eckert

Extreme Extraction - Machine Reading in a WeekShuyo Nakatani

Short Text Language Detection with Infinity-GramShuyo Nakatani

Recommandé

Csr2011 june17 15_15_kaminskiCSR2011

MorphoRuEval-2017. Part-of-Speech Tagging: The Power of the Linear SVM-based ...Anton Kazennikov

Classification of Arabic Texts using Four ClassifiersIJCSIS Research Publications

Are You Ready For Clean Code?Whisnu Sucitanuary

CodeJugalbandi-Expression-Problem-HealthyCode-Magazine#Jan-2015-IssueDhaval Dalal

Guidance, Please! Towards a Framework for RDF-based Constraint Languages.Kai Eckert

Extreme Extraction - Machine Reading in a WeekShuyo Nakatani

Short Text Language Detection with Infinity-GramShuyo Nakatani

[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing SystemsShuyo Nakatani

Manifold learning with application to object recognitionzukun

Methods of Manifold Learning for Dimension Reduction of Large Data SetsRyan B Harvey, CSDP, CSM

Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...wl820609

The Gaussian Process Latent Variable Model (GPLVM)James McMurray

Topic ModelsClaudia Wagner

関東CV勉強会 Kernel PCA (2011.2.19)Akisato Kimura

Self-organizing mapTarat Diloksawatdikul

WSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender SystemsKotaro Tanahashi

Visualizing Data Using t-SNETomoki Hayashi

AutoEncoderで特徴抽出Kai Sasaki

LDA入門正志坪坂

非線形データの次元圧縮 150905 WACODE 2ndMika Yoshimura

CVIM#11 3. 最小化のための数値計算sleepy_yoshi

Numpy scipyで独立成分分析Shintaro Fukushima

基底変換、固有値・固有ベクトル、そしてその先Taketo Sano

Hyperoptとその周辺についてKeisuke Hosaka

TldrNishaMohanDevadiga

The Volcano/Cascades Optimizer宇傅

Software size distribution - Why we always underestimate software costIsrael Herraiz

DGraph: Introduction To Basics & Quick Start W/RatelKnoldus Inc.

230906 paper summary - learning to world model with language - public.pdfSeungjoon1

Contenu connexe

En vedette

[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing SystemsShuyo Nakatani

Manifold learning with application to object recognitionzukun

Methods of Manifold Learning for Dimension Reduction of Large Data SetsRyan B Harvey, CSDP, CSM

Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...wl820609

The Gaussian Process Latent Variable Model (GPLVM)James McMurray

Topic ModelsClaudia Wagner

関東CV勉強会 Kernel PCA (2011.2.19)Akisato Kimura

Self-organizing mapTarat Diloksawatdikul

WSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender SystemsKotaro Tanahashi

Visualizing Data Using t-SNETomoki Hayashi

AutoEncoderで特徴抽出Kai Sasaki

LDA入門正志坪坂

非線形データの次元圧縮 150905 WACODE 2ndMika Yoshimura

CVIM#11 3. 最小化のための数値計算sleepy_yoshi

Numpy scipyで独立成分分析Shintaro Fukushima

基底変換、固有値・固有ベクトル、そしてその先Taketo Sano

Hyperoptとその周辺についてKeisuke Hosaka

En vedette (17)

[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems

Manifold learning with application to object recognition

Methods of Manifold Learning for Dimension Reduction of Large Data Sets

Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...

The Gaussian Process Latent Variable Model (GPLVM)

Topic Models

関東CV勉強会 Kernel PCA (2011.2.19)

Self-organizing map

WSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender Systems

Visualizing Data Using t-SNE

AutoEncoderで特徴抽出

LDA入門

非線形データの次元圧縮 150905 WACODE 2nd

CVIM#11 3. 最小化のための数値計算

Numpy scipyで独立成分分析

基底変換、固有値・固有ベクトル、そしてその先

Hyperoptとその周辺について

Similaire à [Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametric Topic Model for Labeled Data

TldrNishaMohanDevadiga

The Volcano/Cascades Optimizer宇傅

Software size distribution - Why we always underestimate software costIsrael Herraiz

DGraph: Introduction To Basics & Quick Start W/RatelKnoldus Inc.

230906 paper summary - learning to world model with language - public.pdfSeungjoon1

Challenges and patterns for semantics at scaleRob Vesse

Data PreprocessingzekeLabs Technologies

Similaire à [Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametric Topic Model for Labeled Data (7)

Tldr

The Volcano/Cascades Optimizer

Software size distribution - Why we always underestimate software cost

DGraph: Introduction To Basics & Quick Start W/Ratel

230906 paper summary - learning to world model with language - public.pdf

Challenges and patterns for semantics at scale

Data Preprocessing

Plus de Shuyo Nakatani

画像をテキストで検索したい！(OpenAI CLIP) - VRC-LT #15Shuyo Nakatani

Generative adversarial networksShuyo Nakatani

無限関係モデル (続・わかりやすいパターン認識 13章)Shuyo Nakatani

Memory Networks (End-to-End Memory Networks の Chainer 実装)Shuyo Nakatani

人工知能と機械学習の違いって？Shuyo Nakatani

RとStanでクラウドセットアップ時間を分析してみたら #TokyoRShuyo Nakatani

ドラえもんでわかる統計的因果推論 #TokyoRShuyo Nakatani

[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...Shuyo Nakatani

星野「調査観察データの統計科学」第3章Shuyo Nakatani

星野「調査観察データの統計科学」第1＆2章Shuyo Nakatani

言語処理するのに Python でいいの？ #PyDataTokyoShuyo Nakatani

Zipf? (ジップ則のひみつ？) #DSIRNLPShuyo Nakatani

ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...Shuyo Nakatani

ソーシャルメディアの多言語判定 #SoC2014Shuyo Nakatani

猫に教えてもらうルベーグ可測Shuyo Nakatani

アラビア語とペルシャ語の見分け方 #DSIRNLP 5Shuyo Nakatani

どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013Shuyo Nakatani

Active Learning 入門Shuyo Nakatani

数式を綺麗にプログラミングするコツ #spro2013Shuyo Nakatani

ノンパラベイズ入門の入門Shuyo Nakatani

Plus de Shuyo Nakatani (20)

画像をテキストで検索したい！(OpenAI CLIP) - VRC-LT #15

Generative adversarial networks

無限関係モデル (続・わかりやすいパターン認識 13章)

Memory Networks (End-to-End Memory Networks の Chainer 実装)

人工知能と機械学習の違いって？

RとStanでクラウドセットアップ時間を分析してみたら #TokyoR

ドラえもんでわかる統計的因果推論 #TokyoR

[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowl...

星野「調査観察データの統計科学」第3章

星野「調査観察データの統計科学」第1＆2章

言語処理するのに Python でいいの？ #PyDataTokyo

Zipf? (ジップ則のひみつ？) #DSIRNLP

ACL2014 Reading: [Zhang+] "Kneser-Ney Smoothing on Expected Count" and [Pickh...

ソーシャルメディアの多言語判定 #SoC2014

猫に教えてもらうルベーグ可測

アラビア語とペルシャ語の見分け方 #DSIRNLP 5

どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013

Active Learning 入門

数式を綺麗にプログラミングするコツ #spro2013

ノンパラベイズ入門の入門

Dernier

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

WordPress Websites for Engineers: Elevate Your Brandgvaughan

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

From Family Reminiscence to Scholarly Archive .Alan Dix

What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina

How to write a Business Continuity PlanDatabarracks

Dernier (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!

Gen AI in Business - Global Trends Report 2024.pdf

Ensuring Technical Readiness For Copilot in Microsoft 365

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

WordPress Websites for Engineers: Elevate Your Brand

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Streamlining Python Development: A Guide to a Modern Project Setup

Are Multi-Cloud and Serverless Good or Bad?

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

DSPy a system for AI to Write Prompts and Do Fine Tuning

TeamStation AI System Report LATAM IT Salaries 2024

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

The Ultimate Guide to Choosing WordPress Pros and Cons

How AI, OpenAI, and ChatGPT impact business and software.

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

"Debugging python applications inside k8s environment", Andrii Soldatenko

Generative AI for Technical Writer or Information Developers

From Family Reminiscence to Scholarly Archive .

What is DBT - The Ultimate Data Build Tool.pdf

How to write a Business Continuity Plan

[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametric Topic Model for Labeled Data

1. [Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametric Topic Model for Labeled Data 2012/07/28 Nakatani Shuyo @ Cybozu Labs, Inc twitter : @shuyo

2. LDA(Latent Dirichlet Allocation) [Blei+ 03] • Unsupervised Topic Model – Each word has an unobserved topic • Parametric – The topic size K is given in advance via Wikipedia

3. Labeled LDA [Ramage+ 09] • Supervised Topic Model – Each document has an observed label • Parametric via [Ramage+ 09]

4. Generative Process for L-LDA • 𝜷 𝑘 ~Dir 𝜼 topics corresponding to 𝑑 observed labels • Λ 𝑘 ~Bernoulli Φ 𝑘 • 𝜽 𝑑 ~Dir 𝜶 𝑑 restricted to labeled – where 𝜶 𝑑 = 𝛼𝑘 parameters 𝑑 𝑘 Λ 𝑘 =1 𝑑 𝑑 • 𝑧 𝑖 ~Multi 𝜽 𝑑 • 𝑤𝑖 ~Multi 𝜷 𝑧 𝑑 𝑖 via [Ramage+ 09]

5. Pros/Cons of L-LDA • Pros – Easy to implement • Cons via [Ramage+ 09] – It is necessary to specify label-topic correspondence manually • Its performance depends on the corresponds ※) My implementation is here : https://github.com/shuyo/iir/blob/master/lda/llda.py

6. DP-MRM [Kim+ 12] – Dirichlet Process with Mixed Random Measures • Supervised Topic Model • Nonparametric – K is not the topic size, but the label size 𝛼 𝑁𝑗 𝐻 𝐺0𝑘 𝐺𝑗 𝜃 𝑗𝑖 𝑥 𝑗𝑖 𝜆j 𝑟𝑗 𝐷 𝛽 𝛾𝑘 𝜂 𝐾

7. Generative Process for DP-MRM 𝛼 Each label has a random measure as topic space 𝑁𝑗 𝐻 𝐺0𝑘 𝐺𝑗 𝜃 𝑗𝑖 𝑥 𝑗𝑖 • 𝐻 = Dir 𝛽 𝜆j 𝑟𝑗 𝐷 • 𝐺0𝑘 ~DP 𝛾 𝑘 , 𝐻 𝛽 𝐾 𝛾𝑘 𝜂 • 𝜆 𝑗 ~Dir 𝒓 𝑗 𝜂 where 𝒓 𝑗 = 𝐼 𝑘∈label 𝑗 • 𝐺 𝑗 ~DP 𝛼, 𝑘∈label 𝑗 𝜆 𝑗𝑘 𝐺0𝑘 mixed random measures • 𝜃 𝑗𝑖 ~𝐺 𝑗 , 𝑥 𝑗𝑖 ~𝐹 𝜃 𝑗𝑖 = Multi 𝜃 𝑗𝑖

8. Stick Breaking Process • 𝑣 𝑙 𝑘 ~Beta 1, 𝛾 𝑘 , 𝜋 𝑙𝑘 = 𝑣 𝑙 𝑘 𝑙−1 𝑑=0 1 − 𝑣 𝑑𝑘 • 𝜙 𝑙𝑘 ~𝐻, 𝐺0𝑘 = ∞ 𝑙=0 𝜋 𝑙𝑘 𝛿 𝜙 𝑘 𝑙 𝑡−1 • 𝜆 𝑗 ~Dir 𝒓 𝑗 𝜂 , 𝑤 𝑗𝑡 ~Beta 1, 𝛼 , 𝜋 𝑗𝑡 = 𝑤 𝑗𝑡 𝑑=0 1 − 𝑤 𝑗𝑑 𝑘 𝑗𝑡 ∞ • 𝑘 𝑗𝑡 ~Multi 𝜆 𝑗 , 𝜓 𝑗𝑡 ~𝐺0 , 𝐺𝑗 = 𝑡=0 𝜋 𝑗𝑡 𝛿 𝜓 𝑗𝑡

9. Chinese Restaurant Franchise • 𝑡 𝑗𝑖 : table index of 𝑖-th term in 𝑗-th document • 𝑘 𝑗𝑡 , 𝑙 𝑗𝑡 : dish indexes on 𝑡-th table of 𝑗-th document This layer consists on only a single DP G0 on normal HDP

10. Inference (1) • Sampling 𝑡

11. Inference (2) • Sampling 𝑘 and 𝑙

12. Experiments • DP-MRM gives label-topic probabilistic corresponding automatically. via [Kim+ 12]

13. via [Kim+ 12] • L-LDA can also predict single labeled document to assign a common second label to any documents.

14. References • [Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametric Topic Model for Labeled Data • [Ramage+ EMNLP2009] Labeled LDA : A supervised topic model for credit attribution in multi-labeled corpora • [Blei+ 2003] Latent Dirichlet Allocation