SlideShare une entreprise Scribd logo
1  sur  19
Information-theoretic co-clustering Authors / Inderjit S. Dhillon, SubramanyamMallela and Dharmendra S. Modha Conference / ACM SIGKDD ’03, August 24-27, 2003, Washington Presenter / Meng-Lun, Wu 1
Outline Introduction Problem Formulation Co-Clustering Algorithm Experimental Result Conclusions And Future Work 2
Introduction (cont.) Clustering is a fundamental tool in unsupervised learning. Most clustering algorithms focus on one-way clustering. Clustering 3
Introduction (cont.) It is often desirable to co-cluster or simultaneously cluster both dimensions. The normalized non-negative contingency table into a joint probability distribution between two discrete random variables. The optimal co-clustering is one that leads to the largest mutual information between the clustered random variables. 4
Introduction (cont.) The optimal co-clustering is one that minimizes the loss in mutual information. The mutual information of two random variables is a quantity that measures the mutual dependence of the two variables. Formally, the mutual information can be defined as: 5
Introduction (cont.) The Kullback-Leibler (K-L) divergence, measures the difference between two probability distributions. Given the true probability distribution p(x,y) and another distribution q(x,y) can be defined as: 6
Problem formulation Let X and Y be discrete random variables. X: {x1,…,xm}, Y: {y1,…,yn} p(X, Y) denote the joint probability distribution. Let the k clusters of X as:  Let the l clusters of Y as: {ŷ1, ŷ2, . . . , ŷl} 7
Problem formulation (cont.) Definition  An optimal co-clustering minimizes Subject to constraints on the number of row and column clusters. For a fixed co-clustering (CX,CY), we can write the loss in mutual information. 8
Problem formulation (cont.) 9
Problem formulation (cont.) q(X,Y) is a distribution of the form 0.18   0.18  0.14   0.14   0.18  0.18 0.5   0.5 0.15 0.15 0.15 0.15 0.2 0.2 10 0.3 0.3 0.4 Suppose
Co-CLUSTERING Algorithm Input :  The joint probability distribution p(X,Y), k the desired number of row clusters and l the desired number of column clusters. Output: The partition functions C†X and C†Y 11
Co-CLUSTERING Algorithm (cont.) 12 ^x3^x1 ^x3^x2
Co-CLUSTERING Algorithm (cont.) 13 ŷ2 ŷ1 ŷ1 ŷ2
Co-CLUSTERING Algorithm (cont.) 14 D(p||q)=0.02881
Experimental results For our experimental results we use various subsets of the 20-Newsgroup data(NG20). We use 1D-clustering to denote document clustering without any word clustering. Evaluation Measures Micro-averaged-precision Micro-averaged-recall 15
Experimental results (cont.) 16
Experimental results (cont.) 17
Experimental results (cont.) 18
CONCLUSIONS AND FUTURE WORK The information-theoretic formulation for co-clustering can be guaranteed to reach a local minimum in a finite number of steps. Co-clustering for joint distribution of two random variables. In this paper, the row and column clusters are pre-specified. We hope that an information-theoretic regularization procedure may allow us to select the number of clusters. 19

Contenu connexe

Tendances

Learning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identificationLearning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identificationNAVER Engineering
 
[Pr12] dann jaejun yoo
[Pr12] dann   jaejun yoo[Pr12] dann   jaejun yoo
[Pr12] dann jaejun yooJaeJun Yoo
 
DLLab 異常検知ナイト 資料 20180214
DLLab 異常検知ナイト 資料 20180214DLLab 異常検知ナイト 資料 20180214
DLLab 異常検知ナイト 資料 20180214Kosuke Nakago
 
量的データの分析・報告で気をつけたいこと
量的データの分析・報告で気をつけたいこと量的データの分析・報告で気をつけたいこと
量的データの分析・報告で気をつけたいことMizumoto Atsushi
 
SSD no banco de dados é bom mesmo?
SSD no banco de dados é bom mesmo?SSD no banco de dados é bom mesmo?
SSD no banco de dados é bom mesmo?pichiliani
 
ぞくパタ最終回: 13章「共クラスタリング」
ぞくパタ最終回: 13章「共クラスタリング」ぞくパタ最終回: 13章「共クラスタリング」
ぞくパタ最終回: 13章「共クラスタリング」Akifumi Eguchi
 
Rによるprincomp関数を使わない主成分分析
Rによるprincomp関数を使わない主成分分析Rによるprincomp関数を使わない主成分分析
Rによるprincomp関数を使わない主成分分析wada, kazumi
 
フリーソフトウェアを通じた多変量解析講習
フリーソフトウェアを通じた多変量解析講習フリーソフトウェアを通じた多変量解析講習
フリーソフトウェアを通じた多変量解析講習h_yama2396
 
Twitterユーザに対するゼロショットタグ付け
Twitterユーザに対するゼロショットタグ付けTwitterユーザに対するゼロショットタグ付け
Twitterユーザに対するゼロショットタグ付けKohei Shinden
 
みどりぼん読書会 第4章
みどりぼん読書会 第4章みどりぼん読書会 第4章
みどりぼん読書会 第4章Masanori Takano
 
Liver segmentation using U-net: Practical issues @ SNU-TF
Liver segmentation using U-net: Practical issues @ SNU-TFLiver segmentation using U-net: Practical issues @ SNU-TF
Liver segmentation using U-net: Practical issues @ SNU-TFWonjoongCheon
 

Tendances (14)

Learning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identificationLearning Disentangled Representation for Robust Person Re-identification
Learning Disentangled Representation for Robust Person Re-identification
 
[Pr12] dann jaejun yoo
[Pr12] dann   jaejun yoo[Pr12] dann   jaejun yoo
[Pr12] dann jaejun yoo
 
DLLab 異常検知ナイト 資料 20180214
DLLab 異常検知ナイト 資料 20180214DLLab 異常検知ナイト 資料 20180214
DLLab 異常検知ナイト 資料 20180214
 
量的データの分析・報告で気をつけたいこと
量的データの分析・報告で気をつけたいこと量的データの分析・報告で気をつけたいこと
量的データの分析・報告で気をつけたいこと
 
Jokyokai
JokyokaiJokyokai
Jokyokai
 
SSD no banco de dados é bom mesmo?
SSD no banco de dados é bom mesmo?SSD no banco de dados é bom mesmo?
SSD no banco de dados é bom mesmo?
 
ぞくパタ最終回: 13章「共クラスタリング」
ぞくパタ最終回: 13章「共クラスタリング」ぞくパタ最終回: 13章「共クラスタリング」
ぞくパタ最終回: 13章「共クラスタリング」
 
Rによるprincomp関数を使わない主成分分析
Rによるprincomp関数を使わない主成分分析Rによるprincomp関数を使わない主成分分析
Rによるprincomp関数を使わない主成分分析
 
フリーソフトウェアを通じた多変量解析講習
フリーソフトウェアを通じた多変量解析講習フリーソフトウェアを通じた多変量解析講習
フリーソフトウェアを通じた多変量解析講習
 
Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlab
 
Twitterユーザに対するゼロショットタグ付け
Twitterユーザに対するゼロショットタグ付けTwitterユーザに対するゼロショットタグ付け
Twitterユーザに対するゼロショットタグ付け
 
みどりぼん読書会 第4章
みどりぼん読書会 第4章みどりぼん読書会 第4章
みどりぼん読書会 第4章
 
Liver segmentation using U-net: Practical issues @ SNU-TF
Liver segmentation using U-net: Practical issues @ SNU-TFLiver segmentation using U-net: Practical issues @ SNU-TF
Liver segmentation using U-net: Practical issues @ SNU-TF
 
正準相関分析
正準相関分析正準相関分析
正準相関分析
 

Similaire à Information Theoretic Co Clustering

11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.pptSueMiu
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptSubrata Kumer Paul
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
 
Fuzzy c-Means Clustering Algorithms
Fuzzy c-Means Clustering AlgorithmsFuzzy c-Means Clustering Algorithms
Fuzzy c-Means Clustering AlgorithmsJustin Cletus
 
Scalable Constrained Spectral Clustering
Scalable Constrained Spectral ClusteringScalable Constrained Spectral Clustering
Scalable Constrained Spectral Clustering1crore projects
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep LearningRayKim51
 
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXING
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXINGCOMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXING
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXINGcsandit
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Salah Amean
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMWireilla
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMijfls
 
Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS TECSI FEA USP
 
CS583-unsupervised-learning.ppt learning
CS583-unsupervised-learning.ppt learningCS583-unsupervised-learning.ppt learning
CS583-unsupervised-learning.ppt learningssuserb02eff
 
CS583-unsupervised-learning.ppt
CS583-unsupervised-learning.pptCS583-unsupervised-learning.ppt
CS583-unsupervised-learning.pptHathiramN1
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learningAnil Yadav
 
Free vibration analysis of composite plates with uncertain properties
Free vibration analysis of composite plates  with uncertain propertiesFree vibration analysis of composite plates  with uncertain properties
Free vibration analysis of composite plates with uncertain propertiesUniversity of Glasgow
 
Edd clustering algorithm for
Edd clustering algorithm forEdd clustering algorithm for
Edd clustering algorithm forcsandit
 
EDD CLUSTERING ALGORITHM FOR WIRELESS SENSOR NETWORKS
EDD CLUSTERING ALGORITHM FOR WIRELESS SENSOR NETWORKSEDD CLUSTERING ALGORITHM FOR WIRELESS SENSOR NETWORKS
EDD CLUSTERING ALGORITHM FOR WIRELESS SENSOR NETWORKScscpconf
 

Similaire à Information Theoretic Co Clustering (20)

11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.ppt
 
11 clusadvanced
11 clusadvanced11 clusadvanced
11 clusadvanced
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.pptChapter 11. Cluster Analysis Advanced Methods.ppt
Chapter 11. Cluster Analysis Advanced Methods.ppt
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 
Fuzzy c-Means Clustering Algorithms
Fuzzy c-Means Clustering AlgorithmsFuzzy c-Means Clustering Algorithms
Fuzzy c-Means Clustering Algorithms
 
Scalable Constrained Spectral Clustering
Scalable Constrained Spectral ClusteringScalable Constrained Spectral Clustering
Scalable Constrained Spectral Clustering
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXING
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXINGCOMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXING
COMPARISON OF VOLUME AND DISTANCE CONSTRAINT ON HYPERSPECTRAL UNMIXING
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
 
CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...
CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...
CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
 
Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS
 
Ica group 3[1]
Ica group 3[1]Ica group 3[1]
Ica group 3[1]
 
CS583-unsupervised-learning.ppt learning
CS583-unsupervised-learning.ppt learningCS583-unsupervised-learning.ppt learning
CS583-unsupervised-learning.ppt learning
 
CS583-unsupervised-learning.ppt
CS583-unsupervised-learning.pptCS583-unsupervised-learning.ppt
CS583-unsupervised-learning.ppt
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning
 
Free vibration analysis of composite plates with uncertain properties
Free vibration analysis of composite plates  with uncertain propertiesFree vibration analysis of composite plates  with uncertain properties
Free vibration analysis of composite plates with uncertain properties
 
Edd clustering algorithm for
Edd clustering algorithm forEdd clustering algorithm for
Edd clustering algorithm for
 
EDD CLUSTERING ALGORITHM FOR WIRELESS SENSOR NETWORKS
EDD CLUSTERING ALGORITHM FOR WIRELESS SENSOR NETWORKSEDD CLUSTERING ALGORITHM FOR WIRELESS SENSOR NETWORKS
EDD CLUSTERING ALGORITHM FOR WIRELESS SENSOR NETWORKS
 

Plus de AllenWu

A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringAllenWu
 
Collaborative filtering with CCAM
Collaborative filtering with CCAMCollaborative filtering with CCAM
Collaborative filtering with CCAMAllenWu
 
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsDSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsAllenWu
 
Co-clustering with augmented data
Co-clustering with augmented dataCo-clustering with augmented data
Co-clustering with augmented dataAllenWu
 
Ch4.mapreduce algorithm design
Ch4.mapreduce algorithm designCh4.mapreduce algorithm design
Ch4.mapreduce algorithm designAllenWu
 
地震知識
地震知識地震知識
地震知識AllenWu
 
Collaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrixCollaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrixAllenWu
 
Co clustering by-block_value_decomposition
Co clustering by-block_value_decompositionCo clustering by-block_value_decomposition
Co clustering by-block_value_decompositionAllenWu
 
Semantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual AnalysisSemantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual AnalysisAllenWu
 

Plus de AllenWu (9)

A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
Collaborative filtering with CCAM
Collaborative filtering with CCAMCollaborative filtering with CCAM
Collaborative filtering with CCAM
 
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsDSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
 
Co-clustering with augmented data
Co-clustering with augmented dataCo-clustering with augmented data
Co-clustering with augmented data
 
Ch4.mapreduce algorithm design
Ch4.mapreduce algorithm designCh4.mapreduce algorithm design
Ch4.mapreduce algorithm design
 
地震知識
地震知識地震知識
地震知識
 
Collaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrixCollaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrix
 
Co clustering by-block_value_decomposition
Co clustering by-block_value_decompositionCo clustering by-block_value_decomposition
Co clustering by-block_value_decomposition
 
Semantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual AnalysisSemantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual Analysis
 

Dernier

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Dernier (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Information Theoretic Co Clustering

  • 1. Information-theoretic co-clustering Authors / Inderjit S. Dhillon, SubramanyamMallela and Dharmendra S. Modha Conference / ACM SIGKDD ’03, August 24-27, 2003, Washington Presenter / Meng-Lun, Wu 1
  • 2. Outline Introduction Problem Formulation Co-Clustering Algorithm Experimental Result Conclusions And Future Work 2
  • 3. Introduction (cont.) Clustering is a fundamental tool in unsupervised learning. Most clustering algorithms focus on one-way clustering. Clustering 3
  • 4. Introduction (cont.) It is often desirable to co-cluster or simultaneously cluster both dimensions. The normalized non-negative contingency table into a joint probability distribution between two discrete random variables. The optimal co-clustering is one that leads to the largest mutual information between the clustered random variables. 4
  • 5. Introduction (cont.) The optimal co-clustering is one that minimizes the loss in mutual information. The mutual information of two random variables is a quantity that measures the mutual dependence of the two variables. Formally, the mutual information can be defined as: 5
  • 6. Introduction (cont.) The Kullback-Leibler (K-L) divergence, measures the difference between two probability distributions. Given the true probability distribution p(x,y) and another distribution q(x,y) can be defined as: 6
  • 7. Problem formulation Let X and Y be discrete random variables. X: {x1,…,xm}, Y: {y1,…,yn} p(X, Y) denote the joint probability distribution. Let the k clusters of X as: Let the l clusters of Y as: {ŷ1, ŷ2, . . . , ŷl} 7
  • 8. Problem formulation (cont.) Definition An optimal co-clustering minimizes Subject to constraints on the number of row and column clusters. For a fixed co-clustering (CX,CY), we can write the loss in mutual information. 8
  • 10. Problem formulation (cont.) q(X,Y) is a distribution of the form 0.18 0.18 0.14 0.14 0.18 0.18 0.5 0.5 0.15 0.15 0.15 0.15 0.2 0.2 10 0.3 0.3 0.4 Suppose
  • 11. Co-CLUSTERING Algorithm Input : The joint probability distribution p(X,Y), k the desired number of row clusters and l the desired number of column clusters. Output: The partition functions C†X and C†Y 11
  • 12. Co-CLUSTERING Algorithm (cont.) 12 ^x3^x1 ^x3^x2
  • 13. Co-CLUSTERING Algorithm (cont.) 13 ŷ2 ŷ1 ŷ1 ŷ2
  • 14. Co-CLUSTERING Algorithm (cont.) 14 D(p||q)=0.02881
  • 15. Experimental results For our experimental results we use various subsets of the 20-Newsgroup data(NG20). We use 1D-clustering to denote document clustering without any word clustering. Evaluation Measures Micro-averaged-precision Micro-averaged-recall 15
  • 19. CONCLUSIONS AND FUTURE WORK The information-theoretic formulation for co-clustering can be guaranteed to reach a local minimum in a finite number of steps. Co-clustering for joint distribution of two random variables. In this paper, the row and column clusters are pre-specified. We hope that an information-theoretic regularization procedure may allow us to select the number of clusters. 19