Information Theoretic Co Clustering

•Télécharger en tant que PPTX, PDF•

1 j'aime•1,001 vues

Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log and market-basket data analysis. A basic problem in contingency table analysis is co-clustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses the co-clustering problem as an optimization problem in information theory — the optimal co-clustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters. We present an innovative co-clustering algorithm that monotonically increases the preserved mutual information by intertwining both the row and column clusterings at all stages. Using the practical example of simultaneous word-document clustering, we demonstrate that our algorithm works well in practice, especially in the presence of sparsity and high-dimensionality.

Technologie Formation

Information-theoretic co-clustering Authors / Inderjit S. Dhillon, SubramanyamMallela and Dharmendra S. Modha Conference / ACM SIGKDD ’03, August 24-27, 2003, Washington Presenter / Meng-Lun, Wu 1

Outline Introduction Problem Formulation Co-Clustering Algorithm Experimental Result Conclusions And Future Work 2

Introduction (cont.) Clustering is a fundamental tool in unsupervised learning. Most clustering algorithms focus on one-way clustering. Clustering 3

Introduction (cont.) It is often desirable to co-cluster or simultaneously cluster both dimensions. The normalized non-negative contingency table into a joint probability distribution between two discrete random variables. The optimal co-clustering is one that leads to the largest mutual information between the clustered random variables. 4

Introduction (cont.) The optimal co-clustering is one that minimizes the loss in mutual information. The mutual information of two random variables is a quantity that measures the mutual dependence of the two variables. Formally, the mutual information can be defined as: 5

Introduction (cont.) The Kullback-Leibler (K-L) divergence, measures the difference between two probability distributions. Given the true probability distribution p(x,y) and another distribution q(x,y) can be defined as: 6

Problem formulation Let X and Y be discrete random variables. X: {x1,…,xm}, Y: {y1,…,yn} p(X, Y) denote the joint probability distribution. Let the k clusters of X as: Let the l clusters of Y as: {ŷ1, ŷ2, . . . , ŷl} 7

Problem formulation (cont.) Definition An optimal co-clustering minimizes Subject to constraints on the number of row and column clusters. For a fixed co-clustering (CX,CY), we can write the loss in mutual information. 8

Problem formulation (cont.) q(X,Y) is a distribution of the form 0.18 0.18 0.14 0.14 0.18 0.18 0.5 0.5 0.15 0.15 0.15 0.15 0.2 0.2 10 0.3 0.3 0.4 Suppose

Co-CLUSTERING Algorithm Input : The joint probability distribution p(X,Y), k the desired number of row clusters and l the desired number of column clusters. Output: The partition functions C†X and C†Y 11

Co-CLUSTERING Algorithm (cont.) 12 ^x3^x1 ^x3^x2

Co-CLUSTERING Algorithm (cont.) 13 ŷ2 ŷ1 ŷ1 ŷ2

Co-CLUSTERING Algorithm (cont.) 14 D(p||q)=0.02881

Experimental results For our experimental results we use various subsets of the 20-Newsgroup data(NG20). We use 1D-clustering to denote document clustering without any word clustering. Evaluation Measures Micro-averaged-precision Micro-averaged-recall 15

CONCLUSIONS AND FUTURE WORK The information-theoretic formulation for co-clustering can be guaranteed to reach a local minimum in a finite number of steps. Co-clustering for joint distribution of two random variables. In this paper, the row and column clusters are pre-specified. We hope that an information-theoretic regularization procedure may allow us to select the number of clusters. 19

Recommandé

Thesis Defense PresentationStephanie Libby, M.S.

傾向スコアマッチと多重補完法の解説　その２Atsushi Shiraishi

WBICによる混合正規分布の分離と抽出Yusuke TAMAI

PRML 5.2.1-5.3.3 ニューラルネットワークの学習 (誤差逆伝播) / Training Neural Networks (Backpropa...Akihiro Nitta

ワイン方程式Mitsuo Shimohata

Jaist東京社会人コース_口頭試問資料Haruka Ichimura

多変量解析を用いたメタボロームデータ解析h_yama2396

Overcoming Catastrophic Forgetting in Neural Networks読んだYusuke Uchida

Recommandé

Thesis Defense PresentationStephanie Libby, M.S.

傾向スコアマッチと多重補完法の解説　その２Atsushi Shiraishi

WBICによる混合正規分布の分離と抽出Yusuke TAMAI

PRML 5.2.1-5.3.3 ニューラルネットワークの学習 (誤差逆伝播) / Training Neural Networks (Backpropa...Akihiro Nitta

ワイン方程式Mitsuo Shimohata

Jaist東京社会人コース_口頭試問資料Haruka Ichimura

多変量解析を用いたメタボロームデータ解析h_yama2396

Overcoming Catastrophic Forgetting in Neural Networks読んだYusuke Uchida

Learning Disentangled Representation for Robust Person Re-identificationNAVER Engineering

[Pr12] dann jaejun yooJaeJun Yoo

DLLab 異常検知ナイト資料 20180214Kosuke Nakago

量的データの分析・報告で気をつけたいことMizumoto Atsushi

JokyokaiTaiji Suzuki

SSD no banco de dados é bom mesmo?pichiliani

ぞくパタ最終回: 13章「共クラスタリング」Akifumi Eguchi

Rによるprincomp関数を使わない主成分分析wada, kazumi

フリーソフトウェアを通じた多変量解析講習h_yama2396

Introduction to matlabBinodKumarSahu5

Twitterユーザに対するゼロショットタグ付けKohei Shinden

みどりぼん読書会第4章Masanori Takano

Liver segmentation using U-net: Practical issues @ SNU-TFWonjoongCheon

正準相関分析Akisato Kimura

11ClusAdvanced.pptSueMiu

11 clusadvancedJoonyoungJayGwak

Chapter 11. Cluster Analysis Advanced Methods.pptSubrata Kumer Paul

A PSO-Based Subtractive Data Clustering AlgorithmIJORCS

Fuzzy c-Means Clustering AlgorithmsJustin Cletus

Scalable Constrained Spectral Clustering1crore projects

Bayesian Deep LearningRayKim51

COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXINGcsandit

Contenu connexe

Tendances

Learning Disentangled Representation for Robust Person Re-identificationNAVER Engineering

[Pr12] dann jaejun yooJaeJun Yoo

DLLab 異常検知ナイト資料 20180214Kosuke Nakago

量的データの分析・報告で気をつけたいことMizumoto Atsushi

JokyokaiTaiji Suzuki

SSD no banco de dados é bom mesmo?pichiliani

ぞくパタ最終回: 13章「共クラスタリング」Akifumi Eguchi

Rによるprincomp関数を使わない主成分分析wada, kazumi

フリーソフトウェアを通じた多変量解析講習h_yama2396

Introduction to matlabBinodKumarSahu5

Twitterユーザに対するゼロショットタグ付けKohei Shinden

みどりぼん読書会第4章Masanori Takano

Liver segmentation using U-net: Practical issues @ SNU-TFWonjoongCheon

正準相関分析Akisato Kimura

Tendances (14)

Learning Disentangled Representation for Robust Person Re-identification

[Pr12] dann jaejun yoo

DLLab 異常検知ナイト資料 20180214

量的データの分析・報告で気をつけたいこと

Jokyokai

SSD no banco de dados é bom mesmo?

ぞくパタ最終回: 13章「共クラスタリング」

Rによるprincomp関数を使わない主成分分析

フリーソフトウェアを通じた多変量解析講習

Introduction to matlab

Twitterユーザに対するゼロショットタグ付け

みどりぼん読書会第4章

Liver segmentation using U-net: Practical issues @ SNU-TF

正準相関分析

Similaire à Information Theoretic Co Clustering

11ClusAdvanced.pptSueMiu

11 clusadvancedJoonyoungJayGwak

Chapter 11. Cluster Analysis Advanced Methods.pptSubrata Kumer Paul

A PSO-Based Subtractive Data Clustering AlgorithmIJORCS

Fuzzy c-Means Clustering AlgorithmsJustin Cletus

Scalable Constrained Spectral Clustering1crore projects

Bayesian Deep LearningRayKim51

COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXINGcsandit

Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Salah Amean

CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...The Statistical and Applied Mathematical Sciences Institute

ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMWireilla

ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMijfls

Automated Clustering Project - 12th CONTECSI 34th WCARS TECSI FEA USP

Ica group 3[1]Apoorva Srinivasan

CS583-unsupervised-learning.ppt learningssuserb02eff

CS583-unsupervised-learning.pptHathiramN1

15857 cse422 unsupervised-learningAnil Yadav

Free vibration analysis of composite plates with uncertain propertiesUniversity of Glasgow

Edd clustering algorithm forcsandit

EDD CLUSTERING ALGORITHM FOR WIRELESS SENSOR NETWORKScscpconf

Similaire à Information Theoretic Co Clustering (20)

11ClusAdvanced.ppt

11 clusadvanced

Chapter 11. Cluster Analysis Advanced Methods.ppt

A PSO-Based Subtractive Data Clustering Algorithm

Fuzzy c-Means Clustering Algorithms

Scalable Constrained Spectral Clustering

Bayesian Deep Learning

COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXING

Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...

CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...

ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM

Automated Clustering Project - 12th CONTECSI 34th WCARS

Ica group 3[1]

CS583-unsupervised-learning.ppt learning

CS583-unsupervised-learning.ppt

15857 cse422 unsupervised-learning

Free vibration analysis of composite plates with uncertain properties

Edd clustering algorithm for

EDD CLUSTERING ALGORITHM FOR WIRELESS SENSOR NETWORKS

Plus de AllenWu

A scalable collaborative filtering framework based on co clusteringAllenWu

Collaborative filtering with CCAMAllenWu

DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsAllenWu

Co-clustering with augmented dataAllenWu

Ch4.mapreduce algorithm designAllenWu

地震知識AllenWu

Collaborative filtering using orthogonal nonnegative matrixAllenWu

Co clustering by-block_value_decompositionAllenWu

Semantics In Digital Photos A Contenxtual AnalysisAllenWu

Plus de AllenWu (9)

A scalable collaborative filtering framework based on co clustering

Collaborative filtering with CCAM

DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams

Co-clustering with augmented data

Ch4.mapreduce algorithm design

地震知識

Collaborative filtering using orthogonal nonnegative matrix

Co clustering by-block_value_decomposition

Semantics In Digital Photos A Contenxtual Analysis

Dernier

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

unit 4 immunoblotting technique complete.pptxBkGupta21

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

From Family Reminiscence to Scholarly Archive .Alan Dix

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Dernier (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven

Are Multi-Cloud and Serverless Good or Bad?

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

DSPy a system for AI to Write Prompts and Do Fine Tuning

The Ultimate Guide to Choosing WordPress Pros and Cons

unit 4 immunoblotting technique complete.pptx

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

How AI, OpenAI, and ChatGPT impact business and software.

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES

Ensuring Technical Readiness For Copilot in Microsoft 365

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf

Scanning the Internet for External Cloud Exposures via SSL Certs

What is DBT - The Ultimate Data Build Tool.pdf

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

Generative AI for Technical Writer or Information Developers

From Family Reminiscence to Scholarly Archive .

Developer Data Modeling Mistakes: From Postgres to NoSQL

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

TeamStation AI System Report LATAM IT Salaries 2024

Information Theoretic Co Clustering

1. Information-theoretic co-clustering Authors / Inderjit S. Dhillon, SubramanyamMallela and Dharmendra S. Modha Conference / ACM SIGKDD ’03, August 24-27, 2003, Washington Presenter / Meng-Lun, Wu 1

2. Outline Introduction Problem Formulation Co-Clustering Algorithm Experimental Result Conclusions And Future Work 2

3. Introduction (cont.) Clustering is a fundamental tool in unsupervised learning. Most clustering algorithms focus on one-way clustering. Clustering 3

4. Introduction (cont.) It is often desirable to co-cluster or simultaneously cluster both dimensions. The normalized non-negative contingency table into a joint probability distribution between two discrete random variables. The optimal co-clustering is one that leads to the largest mutual information between the clustered random variables. 4

5. Introduction (cont.) The optimal co-clustering is one that minimizes the loss in mutual information. The mutual information of two random variables is a quantity that measures the mutual dependence of the two variables. Formally, the mutual information can be defined as: 5

6. Introduction (cont.) The Kullback-Leibler (K-L) divergence, measures the difference between two probability distributions. Given the true probability distribution p(x,y) and another distribution q(x,y) can be defined as: 6

7. Problem formulation Let X and Y be discrete random variables. X: {x1,…,xm}, Y: {y1,…,yn} p(X, Y) denote the joint probability distribution. Let the k clusters of X as: Let the l clusters of Y as: {ŷ1, ŷ2, . . . , ŷl} 7

8. Problem formulation (cont.) Definition An optimal co-clustering minimizes Subject to constraints on the number of row and column clusters. For a fixed co-clustering (CX,CY), we can write the loss in mutual information. 8

9. Problem formulation (cont.) 9

10. Problem formulation (cont.) q(X,Y) is a distribution of the form 0.18 0.18 0.14 0.14 0.18 0.18 0.5 0.5 0.15 0.15 0.15 0.15 0.2 0.2 10 0.3 0.3 0.4 Suppose

11. Co-CLUSTERING Algorithm Input : The joint probability distribution p(X,Y), k the desired number of row clusters and l the desired number of column clusters. Output: The partition functions C†X and C†Y 11

12. Co-CLUSTERING Algorithm (cont.) 12 ^x3^x1 ^x3^x2

13. Co-CLUSTERING Algorithm (cont.) 13 ŷ2 ŷ1 ŷ1 ŷ2

14. Co-CLUSTERING Algorithm (cont.) 14 D(p||q)=0.02881

15. Experimental results For our experimental results we use various subsets of the 20-Newsgroup data(NG20). We use 1D-clustering to denote document clustering without any word clustering. Evaluation Measures Micro-averaged-precision Micro-averaged-recall 15

16. Experimental results (cont.) 16

17. Experimental results (cont.) 17

18. Experimental results (cont.) 18

19. CONCLUSIONS AND FUTURE WORK The information-theoretic formulation for co-clustering can be guaranteed to reach a local minimum in a finite number of steps. Co-clustering for joint distribution of two random variables. In this paper, the row and column clusters are pre-specified. We hope that an information-theoretic regularization procedure may allow us to select the number of clusters. 19