Presentation of the CILAB research activity at the CVPL (Associazione Italiana per la ricerca in Computer Vision,
Pattern recognition e machine Learning (CVPL- ex-GIRPR)) congress (CVPL2018).
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Data stream classification by incremental semi-supervised fuzzy clustering
1. Data stream classification
by incremental
semi-supervised fuzzy clustering
G.Casalino, G. Castellano, C.Castiello, A.M.Fanelli, C. Mencar
CVPL2018
gabriella.casalino@uniba.it
2. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
Data streams
• Continuous flow of data
• sensors, online transactions, health monitoring, network traffic,…
• Impractical to store and use all data
• Need of new techniques that:
• Process a finite number of data at a time
• Use a limited amount of memory
• Predict/classify at any time and in a limited amount of time
• Take into account the evolution of data
3. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
Proposed method
• DISSFCM: Dynamic Incremental Semi-Supervised Fuzzy C-Means
• a method for data stream classification that
• works in an incremental way
• dynamically adapts the number of clusters:
• a fixed number of clusters may not capture adequately the evolving
structure of streaming data
• uses unlabeled and labeled data, semi-supervised
• uses fuzzy logic to describe patterns in data
4. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
Proposed method
• Based on semi-supervised fuzzy clustering
algorithm
• Applied to subsequent, non-overlapping chunks of
data so as to enable continuous update of clusters
• SSFCM - Semi-Supervised FCM (Pedrycz and
Waletzky, 1997)
Supervised component
5. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
Split
• When the cluster quality deteriorates from one data
chunk to another, the number of clusters is
increased (by splitting some clusters)
• The cluster quality is evaluated in terms of the
reconstruction error (Pedrycz, 2008)
• The cluster having the highest value of the
reconstruction error is splitted in two clusters
• To find the new two prototypes a conditional fuzzy
clustering is applied to the data belonging to the cluster
6. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
Merge
• The two nearest clusters sharing the same
prototype’s label are merged in one if:
• the number of clusters exceeds a predefined threshold
• the number of data belonging to a cluster is below a
predefined threshold
7. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
DISSFCM
8. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
Experimental results
• Optical recognition of Handwritten Digits dataset
• 5620 samples, 10 classes
• Training set: 90%, Test set: 10%
• #Chunk: 5,10,15,20
• %Labeling: 75%
• Splitting tolerance: 25, 50, 100
• Evaluation measure: classification accuracy
9. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
Trend of the reconstruction
error
#Chunk=20, %Labeling=75%, SplitTol=25
10. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
Accuracy values
#Chunk=5 #Chunk=10
#Chunk=15 #Chunk=20
11. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
Conclusions
• DISSFCM
• learn incrementally from data
• adapt the number of cluster
• inject a-priori knowledge in the process
• Future work:
• the merge activation conditions
• the influence of the chunk composition
• a mechanism to detect outliers, concept drift and the emergence of
new classes.
12. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
http://www.di.uniba.it/~cilab