SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
Extraction de biclusters de valeurs
similaires `a l’aide de l’analyse de concepts
triadiques
M. Kaytoue, S. O. Kuznetsov,
J. Macko, W. Meira Jr. et A. Napoli
Bordeaux, 31 Janvier - 3 F´evrier 2012
Extraction et Gestion des Connaissances - EGC 2012
Context
Knowledge Discovery in Databases
2 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Biclustering numerical data
Numerical data and bicluster
Given a numerical dataset (G, M, W , I)
–object/attribute data-table–
G a set of objects (lines)
M a set of attributes (columns)
W a set of values
I ⊆ G × M × W a relation s.t. (g, m, w) ∈ I, written m(g) = w,
means that object g takes the value w for attribute m
–simply represents data-cells–
a bicluster is a pair (A, B) with A ⊆ G and B ⊆ M.
–a rectangle in the data-table–
3 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Biclustering numerical data
Example
Given a dataset (G, M, W , I) with
G = {g1, g2, g3, g4}
M = {m1, m2, m3, m4, m5}
W = {0, 1, 2, 6, 7, 8, 9}
and e.g. m2(g4) = 9
the bicluster ({g2, g3, g4}, {m3, m4}) can be viewed as the gray
rectangle
m1 m2 m3 m4 m5
g1 1 2 2 1 6
g2 2 1 1 0 6
g3 2 2 1 7 6
g4 8 9 2 6 7
4 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Biclustering numerical data
But... a bicluster should reflect
a local phenomena in the data: “rectangles of values”
connectedness of values: e.g. similar values
overlapping: objects/attributes may belong to several patterns
a partial order, e.g. for algorithmic issues
maximality of rectangles w.r.t. connectedness and ordering
Several types of biclusters
5 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Biclustering numerical data
Several applications
Collaborative filtering and recommender systems
Finding web communities
Discovery of association rules in databases
Gene expression analysis, ...
Several algorithms
Iterative Row and Column Clustering Combination
Divide and Conquer / Distribution Parameter Identification
Greedy Iterative Search / Exhaustive Bicluster Enumeration
A difficult problem generally relying on heuristics
S. C. Madeira and A. L. Oliveira
Biclustering Algorithms for Biological Data Analysis: a survey.
In IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004.
6 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Introducing similarity
A simple similarity relation
w1 θ w2 ⇐⇒ |w1 − w2| ≤ θ with θ ∈ R, w1, w2 ∈ W
Considered type of biclusters
A bicluster (A, B) is a bicluster of similar values if
mi (gj ) θ mk(gl ), ∀gj , gl ∈ A, ∀mi , mk ∈ B
m1 m2 m3 m4 m5
g1 1 2 2 1 6
g2 2 1 1 0 6
g3 2 2 1 7 6
g4 8 9 2 6 7
(with θ = 2)
and maximal if no object/attribute can be added
J. Besson, C. Robardet, L. De Raedt, J.-F. Boulicaut
Mining Bi-sets in Numerical Data.
In KDID 2006: 11-23.
7 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Formal Concept Analysis (G. & W., 99)
From a formal context to a concept lattice...
m1 m2 m3
g1 × ×
g2 × ×
g3 × ×
g4 × ×
g5 × × ×
Formal concepts = maximal rectangles
... with interesting properties (and existing algorithms!)
Maximality of concepts as rectangles
Overlapping of concepts
Specialization/generalisation hierarchy
This is exactly what we need for biclustering
8 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Contribution
FCA: an interesting framework for biclustering
Use FCA for a complete, correct and non-redundant extraction
of biclusters of similar values with lossless discretization
with no set similarity parameter (useful for top-k pattern
discovery)
with a given similarity parameter (as in the literature)
Design an algorithm
better than its competitors
can be easily distributed
can handle several constraints (e.g. size) in the fly
A better understanding of closed numerical pattern mining
9 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Outline
1 Formal Concept Analysis (FCA)
2 A first FCA-based biclustering method
3 Algorithm TriMax
4 Experiments
5 Conclusion and perspectives
10 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Formal Concept Analysis (FCA)
In a nutshell...
FCA
A data analysis theory rooted in order and lattice theory allowing
to characterize formal concepts (also known as closed itemsets)
A concept in a formal context
Formal context (G, M, I): objects, attributes, incidence relation
Two derivations operators allowing to define formal concepts
A concept is a maximal rectangle of ×, modulo column and line
permutations
m1 m2 m3
g1 × ×
g2 × ×
g3 × ×
g4 × ×
g5 × × ×
({g3, g4, g5}, {m2, m3}) is a formal concept
11 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Formal Concept Analysis (FCA)
Triadic Concept Analysis (Lehmann &
Wille, 1995)
“Extension” of FCA to ternary relation
An object has an attribute for a given condition
Triadic context (G, M, B, Y ): objects, attributes, conditions,
incidence relation
Several derivation operators allowing to characterize “triadic
concepts” as maximal cubes of ×
b1 b2 b3
m1 m2 m3
g1 ×
g2 × ×
g3 × ×
g4 × ×
g5 × ×
m1 m2 m3
g1 × × ×
g2 × ×
g3 × × ×
g4 × ×
g5 × ×
m1 m2 m3
g1 × ×
g2 ×
g3 × × ×
g4 × ×
g5 × × ×
({g3, g4, g5}, {m2, m3}, {c1, c2, c3}) is a triadic concept
12 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
1 Formal Concept Analysis (FCA)
2 A first FCA-based biclustering method
3 Algorithm TriMax
4 Experiments
5 Conclusion and perspectives
A first FCA-based biclustering method
Basic idea
Principle
Start from a numerical dataset
Build a triadic context, with same objects, same attributes, and
a discretized non-lossy “numerical space” dimension
Extract triadic concepts
We show interesting links between biclusters of similar
values and triadic concepts
14 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
A first FCA-based biclustering method
Discretization method
Interodinal scaling (existing discretization scale)
Let (G, M, W , I) be a numerical dataset (with W the set of
data-values.
Now consider the set
T = {[min(W ), w], ∀w ∈ W } ∪ {[w, max(W )], ∀w ∈ W }.
Known fact: T and all its intersections characterize any interval
of values on W .
Example
With W = {0, 1, 2, 6, 7, 8, 9}, one has
T = {[0, 0], [0, 1], [0, 2], [0, 3], ..., [1, 9], [2, 9], ..., [9, 9]}
and for example [0, 8] ∩ [2, 9] = [2, 8]
15 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
A first FCA-based biclustering method
Building a triadic context
Transformation procedure
From a numerical dataset (G, M, W , I), build a triadic context
(G, M, T, Y ) such as (g, m, t) ∈ Y ⇐⇒ m(g) ∈ t
16 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
A first FCA-based biclustering method
First contribution
We proved that there is a 1-1-correspondence between
(i) Triadic concepts of the resulting triadic context
(ii) Biclusters of similar values maximal for some θ ≥ 0
Interesting facts
Efficient algorithm for concepts extraction (Data-Peeler)
L. Cerf, J. Besson, C. Robardet, J.-F. Boulicaut
Closed patterns meet n-ary relations.
In TKDD 3(1): (2009).
This algorithm allows to handle several constraints
Top-k biclusters: Concept (A, B, C) with high |A|, |B|, and |C|
corresponds to bicluster (A, B) as a large rectangle of close
values (by properties of interordinal scale)
This formalization allows us to design a new algorithm to
extract maximal biclusters for a given parameter θ
17 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
1 Formal Concept Analysis (FCA)
2 A first FCA-based biclustering method
3 Algorithm TriMax
4 Experiments
5 Conclusion and perspectives
Algorithm TriMax
Compute all max. biclusters for a given
θ
Principle
Use another (but similar) discretization procedure to build the
triadic context based on tolerance blocks
Standard algorithms output biclusters of similar values but not
necessarily maximal
We design a new algorithm TriMax for that task
TriMax is flexible, uses standard FCA algorithms in its
core and is better than its competitors
19 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Algorithm TriMax
Finding maximal set of similar values
θ a tolerance relation
reflexive, symmetric, but not transitive
Blocks of tolerance of W
Maximal sets of pairwise similar values are closed sets
Example with θ = 1
1 0 1 2 6 7 8 9
0 × ×
1 × × ×
2 × ×
6 × ×
7 × × ×
8 × × ×
9 × ×
Blocks of tolerance
{0, 1}
{1, 2}
{6, 7}
{7, 8}
{8, 9}
Renamed classes
[0, 1]
[1, 2]
[6, 7]
[7, 8]
[8, 9]
S. O. Kuznetsov
Galois Connections in Data Analysis: Contributions from the Soviet Era and Modern Russian Research.
In Formal Concept Analysis, Foundations and Applications, 2005.
20 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Algorithm TriMax
New transformation procedure
Tolerance blocks based scaling
Compute the set C of all blocks of tolerance over W
From the numerical dataset (G, M, W , I), build the triadic
context (G, M, C, Z) such that (g, m, c) ∈ Z ⇐⇒ m(g) ∈ c
Actually, we remove “useless information”
θ = 1
21 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Algorithm TriMax
Second contribution
Algorithm TriMax
Any triadic concept corresponds to a bicluster of similar values,
but not necessarily maximal!
It lead us to the algorithm TriMax that:
Process each formal context (one for each block of tolerance)
with any existing FCA algorithm
Any resulting concept is a maximal bicluster candidate and a
simple procedure allow to check maximality (this may be
problematic, but experiments show a good behaviour)
Each context can be processed separately
TriMax allows a complete, correct and non redundant
extraction of all maximal biclusters of similar values for a
user defined similarity parameter θ
22 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
1 Formal Concept Analysis (FCA)
2 A first FCA-based biclustering method
3 Algorithm TriMax
4 Experiments
5 Conclusion and perspectives
Experiments
Trimax - settings
Implementation: C++, boost library 1.42
InClose algorithm for dyadic contexts processing
Data: gene expression data of the species Laccaria bicolor
Configuration: Intel CPU 2.54 Ghz, 8 GB RAM
24 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Experiments
Trimax - monitoring aspects
Starting with all 12 attributes, we make vary the number of
objects, the similarity parameter θ and monitor:
Number of maximal biclusters of similar values
Execution time (in seconds)
Number of tolerance blocks
Density of the triadic context
Comparison between the number of non-maximal biclusters with
the number of maximal biclusters
Execution time profiling of the main procedures of TriMax
25 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Experiments
Trimax - experimental results
Nr. of max. biclusters Execution times in sec. Nr. of blocks of toler.
Density of 3-adic cont. Nr. generated of biclusters Execution time
26 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Experiments
TriMax bottleneck
Computing the modus is problematic...
builds of formal context (2D) for each block of tolerance
extracts concepts (A, B) for each of them
computes the modus C to get triadic concept (A, B, C) and
check maximality
But...
In many applications, experts have preferences
One can remove a bicluster candidate before modus
computation according to some constraints
Example with θ = 33, 000, 500 objects, 12 attributes
104, 226 maximal biclusters extracted in 16.130 sec
5, 332 maximal biclusters in 2.1 sec with at least 10 (at last 40)
objects
27 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Experiments
Comparison
Existing algorithms
Numerical Biset Miner (NBS-Miner) - not scalable
J. Besson, C. Robardet, L. De Raedt, J.-F. Boulicaut
Mining Bi-sets in Numerical Data.
In KDID 2006: 11-23.
Interval Pattern Structures (IPS) - less efficient than TriMax
M. Kaytoue, S. O. Kuznetsov, and A. Napoli
Biclustering Numerical Data in Formal Concept Analysis.
ICFCA, Springer, 2011.
28 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
Experiments
An example of comparison
Increasing number of objects and all 12 attributes.
Results in milliseconds.
θ = 0 θ = 700 θ = 10000
Other scenarii show a similar behaviour.
29 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
1 Formal Concept Analysis (FCA)
2 A first FCA-based biclustering method
3 Algorithm TriMax
4 Experiments
5 Conclusion and perspectives
Conclusion and perspectives
Conclusion
Contribution
A better understanding of closed numerical pattern mining
within FCA
A formal characterization of a type of bicluster
TriMax for efficient computation
Perspectives
top-k bicluster discovery
n-dimensional numerical datasets
Distributed computation
Constraints (size, mean-square residue, etc.)
Links with Fuzzy FCA
31 / 31
Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques

Contenu connexe

Tendances

From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphstuxette
 
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...romovpa
 
Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —Salah Amean
 
Clustering techniques final
Clustering techniques finalClustering techniques final
Clustering techniques finalBenard Maina
 
Linear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data ScienceLinear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data SciencePremier Publishers
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorizationmidi
 
1212 regular meeting
1212 regular meeting1212 regular meeting
1212 regular meetingmarxliouville
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testingChristian Robert
 
IR-ranking
IR-rankingIR-ranking
IR-rankingFELIX75
 
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...AMIDST Toolbox
 
Mapping Subsets of Scholarly Information
Mapping Subsets of Scholarly InformationMapping Subsets of Scholarly Information
Mapping Subsets of Scholarly InformationPaul Houle
 
Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)Benjamin Bengfort
 

Tendances (20)

From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphs
 
Ruta solucion de problemas
Ruta solucion de problemasRuta solucion de problemas
Ruta solucion de problemas
 
Planted Clique Research Paper
Planted Clique Research PaperPlanted Clique Research Paper
Planted Clique Research Paper
 
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...
Non-Bayesian Additive Regularization for Multimodal Topic Modeling of Large C...
 
Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —
 
Clustering techniques final
Clustering techniques finalClustering techniques final
Clustering techniques final
 
Binomial Distribution Part 4
Binomial Distribution Part 4Binomial Distribution Part 4
Binomial Distribution Part 4
 
Linear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data ScienceLinear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data Science
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
1212 regular meeting
1212 regular meeting1212 regular meeting
1212 regular meeting
 
Module 3 Review
Module 3 ReviewModule 3 Review
Module 3 Review
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
IR-ranking
IR-rankingIR-ranking
IR-ranking
 
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
 
Mapping Subsets of Scholarly Information
Mapping Subsets of Scholarly InformationMapping Subsets of Scholarly Information
Mapping Subsets of Scholarly Information
 
Dbm630 lecture09
Dbm630 lecture09Dbm630 lecture09
Dbm630 lecture09
 
Dmdw1
Dmdw1Dmdw1
Dmdw1
 
Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)Dynamics in graph analysis (PyData Carolinas 2016)
Dynamics in graph analysis (PyData Carolinas 2016)
 
SASA 2016
SASA 2016SASA 2016
SASA 2016
 
Lect12 graph mining
Lect12 graph miningLect12 graph mining
Lect12 graph mining
 

Similaire à Extracting biclusters of similar values with Triadic Concept Analysis

Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...Dmitrii Ignatov
 
Searching for optimal patterns in Boolean tensors
Searching for optimal patterns in Boolean tensorsSearching for optimal patterns in Boolean tensors
Searching for optimal patterns in Boolean tensorsDmitrii Ignatov
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksTomaso Aste
 
Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Shenghui Wang
 
Дмитрий Игнатов для ФИSNA
Дмитрий Игнатов для ФИSNAДмитрий Игнатов для ФИSNA
Дмитрий Игнатов для ФИSNAAndzhey Arshavskiy
 
A One-Pass Triclustering Approach: Is There any Room for Big Data?
A One-Pass Triclustering Approach: Is There any Room for Big Data?A One-Pass Triclustering Approach: Is There any Room for Big Data?
A One-Pass Triclustering Approach: Is There any Room for Big Data?Dmitrii Ignatov
 
Orpailleur -- triclustering talk
Orpailleur -- triclustering talkOrpailleur -- triclustering talk
Orpailleur -- triclustering talkDmitrii Ignatov
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...IJRES Journal
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learningAnil Yadav
 
A Branch And Bound Algorithm For The Maximum Clique Problem
A Branch And Bound Algorithm For The Maximum Clique ProblemA Branch And Bound Algorithm For The Maximum Clique Problem
A Branch And Bound Algorithm For The Maximum Clique ProblemSara Alvarez
 
Icitam2019 2020 book_chapter
Icitam2019 2020 book_chapterIcitam2019 2020 book_chapter
Icitam2019 2020 book_chapterBan Bang
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)KU Leuven
 
theory of computation lecture 01
theory of computation lecture 01theory of computation lecture 01
theory of computation lecture 018threspecter
 
slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16Christian Robert
 
[A]BCel : a presentation at ABC in Roma
[A]BCel : a presentation at ABC in Roma[A]BCel : a presentation at ABC in Roma
[A]BCel : a presentation at ABC in RomaChristian Robert
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxVenkateswaraBabuRavi
 

Similaire à Extracting biclusters of similar values with Triadic Concept Analysis (20)

Interval Pattern Structures: An introdution
Interval Pattern Structures: An introdutionInterval Pattern Structures: An introdution
Interval Pattern Structures: An introdution
 
On the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept AnalysisOn the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis
 
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
Turning Krimp into a Triclustering Technique on Sets of Attribute-Condition P...
 
Searching for optimal patterns in Boolean tensors
Searching for optimal patterns in Boolean tensorsSearching for optimal patterns in Boolean tensors
Searching for optimal patterns in Boolean tensors
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
 
Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning
 
Дмитрий Игнатов для ФИSNA
Дмитрий Игнатов для ФИSNAДмитрий Игнатов для ФИSNA
Дмитрий Игнатов для ФИSNA
 
mlcourse.ai. Clustering
mlcourse.ai. Clusteringmlcourse.ai. Clustering
mlcourse.ai. Clustering
 
A One-Pass Triclustering Approach: Is There any Room for Big Data?
A One-Pass Triclustering Approach: Is There any Room for Big Data?A One-Pass Triclustering Approach: Is There any Room for Big Data?
A One-Pass Triclustering Approach: Is There any Room for Big Data?
 
Orpailleur -- triclustering talk
Orpailleur -- triclustering talkOrpailleur -- triclustering talk
Orpailleur -- triclustering talk
 
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
 
Characterizing and mining numerical patterns, an FCA point of view
Characterizing and mining numerical patterns, an FCA point of viewCharacterizing and mining numerical patterns, an FCA point of view
Characterizing and mining numerical patterns, an FCA point of view
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning
 
A Branch And Bound Algorithm For The Maximum Clique Problem
A Branch And Bound Algorithm For The Maximum Clique ProblemA Branch And Bound Algorithm For The Maximum Clique Problem
A Branch And Bound Algorithm For The Maximum Clique Problem
 
Icitam2019 2020 book_chapter
Icitam2019 2020 book_chapterIcitam2019 2020 book_chapter
Icitam2019 2020 book_chapter
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)
 
theory of computation lecture 01
theory of computation lecture 01theory of computation lecture 01
theory of computation lecture 01
 
slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16slides of ABC talk at i-like workshop, Warwick, May 16
slides of ABC talk at i-like workshop, Warwick, May 16
 
[A]BCel : a presentation at ABC in Roma
[A]BCel : a presentation at ABC in Roma[A]BCel : a presentation at ABC in Roma
[A]BCel : a presentation at ABC in Roma
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptx
 

Dernier

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 

Dernier (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Extracting biclusters of similar values with Triadic Concept Analysis

  • 1. Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques M. Kaytoue, S. O. Kuznetsov, J. Macko, W. Meira Jr. et A. Napoli Bordeaux, 31 Janvier - 3 F´evrier 2012 Extraction et Gestion des Connaissances - EGC 2012
  • 2. Context Knowledge Discovery in Databases 2 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 3. Biclustering numerical data Numerical data and bicluster Given a numerical dataset (G, M, W , I) –object/attribute data-table– G a set of objects (lines) M a set of attributes (columns) W a set of values I ⊆ G × M × W a relation s.t. (g, m, w) ∈ I, written m(g) = w, means that object g takes the value w for attribute m –simply represents data-cells– a bicluster is a pair (A, B) with A ⊆ G and B ⊆ M. –a rectangle in the data-table– 3 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 4. Biclustering numerical data Example Given a dataset (G, M, W , I) with G = {g1, g2, g3, g4} M = {m1, m2, m3, m4, m5} W = {0, 1, 2, 6, 7, 8, 9} and e.g. m2(g4) = 9 the bicluster ({g2, g3, g4}, {m3, m4}) can be viewed as the gray rectangle m1 m2 m3 m4 m5 g1 1 2 2 1 6 g2 2 1 1 0 6 g3 2 2 1 7 6 g4 8 9 2 6 7 4 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 5. Biclustering numerical data But... a bicluster should reflect a local phenomena in the data: “rectangles of values” connectedness of values: e.g. similar values overlapping: objects/attributes may belong to several patterns a partial order, e.g. for algorithmic issues maximality of rectangles w.r.t. connectedness and ordering Several types of biclusters 5 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 6. Biclustering numerical data Several applications Collaborative filtering and recommender systems Finding web communities Discovery of association rules in databases Gene expression analysis, ... Several algorithms Iterative Row and Column Clustering Combination Divide and Conquer / Distribution Parameter Identification Greedy Iterative Search / Exhaustive Bicluster Enumeration A difficult problem generally relying on heuristics S. C. Madeira and A. L. Oliveira Biclustering Algorithms for Biological Data Analysis: a survey. In IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004. 6 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 7. Introducing similarity A simple similarity relation w1 θ w2 ⇐⇒ |w1 − w2| ≤ θ with θ ∈ R, w1, w2 ∈ W Considered type of biclusters A bicluster (A, B) is a bicluster of similar values if mi (gj ) θ mk(gl ), ∀gj , gl ∈ A, ∀mi , mk ∈ B m1 m2 m3 m4 m5 g1 1 2 2 1 6 g2 2 1 1 0 6 g3 2 2 1 7 6 g4 8 9 2 6 7 (with θ = 2) and maximal if no object/attribute can be added J. Besson, C. Robardet, L. De Raedt, J.-F. Boulicaut Mining Bi-sets in Numerical Data. In KDID 2006: 11-23. 7 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 8. Formal Concept Analysis (G. & W., 99) From a formal context to a concept lattice... m1 m2 m3 g1 × × g2 × × g3 × × g4 × × g5 × × × Formal concepts = maximal rectangles ... with interesting properties (and existing algorithms!) Maximality of concepts as rectangles Overlapping of concepts Specialization/generalisation hierarchy This is exactly what we need for biclustering 8 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 9. Contribution FCA: an interesting framework for biclustering Use FCA for a complete, correct and non-redundant extraction of biclusters of similar values with lossless discretization with no set similarity parameter (useful for top-k pattern discovery) with a given similarity parameter (as in the literature) Design an algorithm better than its competitors can be easily distributed can handle several constraints (e.g. size) in the fly A better understanding of closed numerical pattern mining 9 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 10. Outline 1 Formal Concept Analysis (FCA) 2 A first FCA-based biclustering method 3 Algorithm TriMax 4 Experiments 5 Conclusion and perspectives 10 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 11. Formal Concept Analysis (FCA) In a nutshell... FCA A data analysis theory rooted in order and lattice theory allowing to characterize formal concepts (also known as closed itemsets) A concept in a formal context Formal context (G, M, I): objects, attributes, incidence relation Two derivations operators allowing to define formal concepts A concept is a maximal rectangle of ×, modulo column and line permutations m1 m2 m3 g1 × × g2 × × g3 × × g4 × × g5 × × × ({g3, g4, g5}, {m2, m3}) is a formal concept 11 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 12. Formal Concept Analysis (FCA) Triadic Concept Analysis (Lehmann & Wille, 1995) “Extension” of FCA to ternary relation An object has an attribute for a given condition Triadic context (G, M, B, Y ): objects, attributes, conditions, incidence relation Several derivation operators allowing to characterize “triadic concepts” as maximal cubes of × b1 b2 b3 m1 m2 m3 g1 × g2 × × g3 × × g4 × × g5 × × m1 m2 m3 g1 × × × g2 × × g3 × × × g4 × × g5 × × m1 m2 m3 g1 × × g2 × g3 × × × g4 × × g5 × × × ({g3, g4, g5}, {m2, m3}, {c1, c2, c3}) is a triadic concept 12 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 13. 1 Formal Concept Analysis (FCA) 2 A first FCA-based biclustering method 3 Algorithm TriMax 4 Experiments 5 Conclusion and perspectives
  • 14. A first FCA-based biclustering method Basic idea Principle Start from a numerical dataset Build a triadic context, with same objects, same attributes, and a discretized non-lossy “numerical space” dimension Extract triadic concepts We show interesting links between biclusters of similar values and triadic concepts 14 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 15. A first FCA-based biclustering method Discretization method Interodinal scaling (existing discretization scale) Let (G, M, W , I) be a numerical dataset (with W the set of data-values. Now consider the set T = {[min(W ), w], ∀w ∈ W } ∪ {[w, max(W )], ∀w ∈ W }. Known fact: T and all its intersections characterize any interval of values on W . Example With W = {0, 1, 2, 6, 7, 8, 9}, one has T = {[0, 0], [0, 1], [0, 2], [0, 3], ..., [1, 9], [2, 9], ..., [9, 9]} and for example [0, 8] ∩ [2, 9] = [2, 8] 15 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 16. A first FCA-based biclustering method Building a triadic context Transformation procedure From a numerical dataset (G, M, W , I), build a triadic context (G, M, T, Y ) such as (g, m, t) ∈ Y ⇐⇒ m(g) ∈ t 16 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 17. A first FCA-based biclustering method First contribution We proved that there is a 1-1-correspondence between (i) Triadic concepts of the resulting triadic context (ii) Biclusters of similar values maximal for some θ ≥ 0 Interesting facts Efficient algorithm for concepts extraction (Data-Peeler) L. Cerf, J. Besson, C. Robardet, J.-F. Boulicaut Closed patterns meet n-ary relations. In TKDD 3(1): (2009). This algorithm allows to handle several constraints Top-k biclusters: Concept (A, B, C) with high |A|, |B|, and |C| corresponds to bicluster (A, B) as a large rectangle of close values (by properties of interordinal scale) This formalization allows us to design a new algorithm to extract maximal biclusters for a given parameter θ 17 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 18. 1 Formal Concept Analysis (FCA) 2 A first FCA-based biclustering method 3 Algorithm TriMax 4 Experiments 5 Conclusion and perspectives
  • 19. Algorithm TriMax Compute all max. biclusters for a given θ Principle Use another (but similar) discretization procedure to build the triadic context based on tolerance blocks Standard algorithms output biclusters of similar values but not necessarily maximal We design a new algorithm TriMax for that task TriMax is flexible, uses standard FCA algorithms in its core and is better than its competitors 19 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 20. Algorithm TriMax Finding maximal set of similar values θ a tolerance relation reflexive, symmetric, but not transitive Blocks of tolerance of W Maximal sets of pairwise similar values are closed sets Example with θ = 1 1 0 1 2 6 7 8 9 0 × × 1 × × × 2 × × 6 × × 7 × × × 8 × × × 9 × × Blocks of tolerance {0, 1} {1, 2} {6, 7} {7, 8} {8, 9} Renamed classes [0, 1] [1, 2] [6, 7] [7, 8] [8, 9] S. O. Kuznetsov Galois Connections in Data Analysis: Contributions from the Soviet Era and Modern Russian Research. In Formal Concept Analysis, Foundations and Applications, 2005. 20 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 21. Algorithm TriMax New transformation procedure Tolerance blocks based scaling Compute the set C of all blocks of tolerance over W From the numerical dataset (G, M, W , I), build the triadic context (G, M, C, Z) such that (g, m, c) ∈ Z ⇐⇒ m(g) ∈ c Actually, we remove “useless information” θ = 1 21 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 22. Algorithm TriMax Second contribution Algorithm TriMax Any triadic concept corresponds to a bicluster of similar values, but not necessarily maximal! It lead us to the algorithm TriMax that: Process each formal context (one for each block of tolerance) with any existing FCA algorithm Any resulting concept is a maximal bicluster candidate and a simple procedure allow to check maximality (this may be problematic, but experiments show a good behaviour) Each context can be processed separately TriMax allows a complete, correct and non redundant extraction of all maximal biclusters of similar values for a user defined similarity parameter θ 22 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 23. 1 Formal Concept Analysis (FCA) 2 A first FCA-based biclustering method 3 Algorithm TriMax 4 Experiments 5 Conclusion and perspectives
  • 24. Experiments Trimax - settings Implementation: C++, boost library 1.42 InClose algorithm for dyadic contexts processing Data: gene expression data of the species Laccaria bicolor Configuration: Intel CPU 2.54 Ghz, 8 GB RAM 24 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 25. Experiments Trimax - monitoring aspects Starting with all 12 attributes, we make vary the number of objects, the similarity parameter θ and monitor: Number of maximal biclusters of similar values Execution time (in seconds) Number of tolerance blocks Density of the triadic context Comparison between the number of non-maximal biclusters with the number of maximal biclusters Execution time profiling of the main procedures of TriMax 25 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 26. Experiments Trimax - experimental results Nr. of max. biclusters Execution times in sec. Nr. of blocks of toler. Density of 3-adic cont. Nr. generated of biclusters Execution time 26 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 27. Experiments TriMax bottleneck Computing the modus is problematic... builds of formal context (2D) for each block of tolerance extracts concepts (A, B) for each of them computes the modus C to get triadic concept (A, B, C) and check maximality But... In many applications, experts have preferences One can remove a bicluster candidate before modus computation according to some constraints Example with θ = 33, 000, 500 objects, 12 attributes 104, 226 maximal biclusters extracted in 16.130 sec 5, 332 maximal biclusters in 2.1 sec with at least 10 (at last 40) objects 27 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 28. Experiments Comparison Existing algorithms Numerical Biset Miner (NBS-Miner) - not scalable J. Besson, C. Robardet, L. De Raedt, J.-F. Boulicaut Mining Bi-sets in Numerical Data. In KDID 2006: 11-23. Interval Pattern Structures (IPS) - less efficient than TriMax M. Kaytoue, S. O. Kuznetsov, and A. Napoli Biclustering Numerical Data in Formal Concept Analysis. ICFCA, Springer, 2011. 28 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 29. Experiments An example of comparison Increasing number of objects and all 12 attributes. Results in milliseconds. θ = 0 θ = 700 θ = 10000 Other scenarii show a similar behaviour. 29 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques
  • 30. 1 Formal Concept Analysis (FCA) 2 A first FCA-based biclustering method 3 Algorithm TriMax 4 Experiments 5 Conclusion and perspectives
  • 31. Conclusion and perspectives Conclusion Contribution A better understanding of closed numerical pattern mining within FCA A formal characterization of a type of bicluster TriMax for efficient computation Perspectives top-k bicluster discovery n-dimensional numerical datasets Distributed computation Constraints (size, mean-square residue, etc.) Links with Fuzzy FCA 31 / 31 Extraction de biclusters de valeurs similaires `a l’aide de l’analyse de concepts triadiques