Data stream classification by incremental semi-supervised fuzzy clustering

•

2 j'aime•89 vues

Presentation of the CILAB research activity at the CVPL (Associazione Italiana per la ricerca in Computer Vision, Pattern recognition e machine Learning (CVPL- ex-GIRPR)) congress (CVPL2018).

Technologie

Data stream classiﬁcation
by incremental
semi-supervised fuzzy clustering
G.Casalino, G. Castellano, C.Castiello, A.M.Fanelli, C. Mencar
CVPL2018
gabriella.casalino@uniba.it

CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classiﬁcation by incremental semi-
supervised fuzzy clustering
Data streams
• Continuous ﬂow of data
• sensors, online transactions, health monitoring, network trafﬁc,…
• Impractical to store and use all data
• Need of new techniques that:
• Process a ﬁnite number of data at a time
• Use a limited amount of memory
• Predict/classify at any time and in a limited amount of time
• Take into account the evolution of data

Contenu connexe

Similaire à Data stream classification by incremental semi-supervised fuzzy clustering

Qu speaker series 14: Synthetic Data Generation in FinanceQuantUniversity

Presentation on K-Means ClusteringPabna University of Science & Technology

The benefits of fine-grained synchronization in deterministic and efficient ...Vincenzo Gulisano

"Agro-Market Prediction by Fuzzy based Neuro-Genetic Algorithm"Government of India and Tata Trusts

ReComp for genomicsPaolo Missier

Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...Yves Sucaet

Grouping techniques for facing Volume and Velocity in the Big DataFacultad de Informática UCM

Mining System Logs to Learn Error Predictors, Universität Stuttgart, Stuttgar...Barbara Russo

Big data and macroeconomic nowcasting from data access to modellingDario Buono

K- means clustering method based Data Mining of Network Shared Resources .pptxSaiPragnaKancheti

Pride Cluster 062016 UpdateJuan Antonio Vizcaino

Data AnalyticsTata Power Delhi Distribution Limited

SEBD2015_PresentationVitaliMonica Vitali

ProFAX: a hardware acceleration of a protein folding algorithmNECST Lab @ Politecnico di Milano

XploreIQ: Machine Learning and Big Data The Successful Use of Algorithms in E...SGS

APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE cscpconf

Big&open data challenges for smartcity-PIC2014 ShanghaiVictoria López

Democratizing Machine Learning: Perspective from a scikit-learn CreatorDatabricks

Shikha fdp 62_14july2017Dr. Shikha Mehta

Similaire à Data stream classification by incremental semi-supervised fuzzy clustering (20)

Qu speaker series 14: Synthetic Data Generation in Finance

Presentation on K-Means Clustering

The benefits of fine-grained synchronization in deterministic and efficient ...

"Agro-Market Prediction by Fuzzy based Neuro-Genetic Algorithm"

ReComp for genomics

Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...

Grouping techniques for facing Volume and Velocity in the Big Data

Mining System Logs to Learn Error Predictors, Universität Stuttgart, Stuttgar...

Big data and macroeconomic nowcasting from data access to modelling

K- means clustering method based Data Mining of Network Shared Resources .pptx

Pride Cluster 062016 Update

Data Analytics

SEBD2015_PresentationVitali

ProFAX: a hardware acceleration of a protein folding algorithm

XploreIQ: Machine Learning and Big Data The Successful Use of Algorithms in E...

APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE

Big&open data challenges for smartcity-PIC2014 Shanghai

Democratizing Machine Learning: Perspective from a scikit-learn Creator

Shikha fdp 62_14july2017

Plus de Gabriella Casalino

IJCCI2023.pdfGabriella Casalino

A mHealth solution for contact-less self-monitoring of vital sign parametersGabriella Casalino

Text mining through Non Negative Matrix FactorizationsGabriella Casalino

Dynamic Incremental Semi-supervised Fuzzy Clustering for Bipolar Disorder Epi...Gabriella Casalino

A mHealth solution for contact-less self-monitoring of vital signs parametersGabriella Casalino

Incremental and Adaptive fuzzy clustering for Virtual Learning Environments D...Gabriella Casalino

The use of an Explainable Artificial Intelligence Tool for Decision-making Su...Gabriella Casalino

Non-negative factorization methods for extracting semantically relevant featu...Gabriella Casalino

ICCSA2014 - slidesGabriella Casalino

Didamatica2012 - slidesGabriella Casalino

WILF2011 - slidesGabriella Casalino

Plus de Gabriella Casalino (11)

IJCCI2023.pdf

A mHealth solution for contact-less self-monitoring of vital sign parameters

Text mining through Non Negative Matrix Factorizations

Dynamic Incremental Semi-supervised Fuzzy Clustering for Bipolar Disorder Epi...

A mHealth solution for contact-less self-monitoring of vital signs parameters

Incremental and Adaptive fuzzy clustering for Virtual Learning Environments D...

The use of an Explainable Artificial Intelligence Tool for Decision-making Su...

Non-negative factorization methods for extracting semantically relevant featu...

ICCSA2014 - slides

Didamatica2012 - slides

WILF2011 - slides

Dernier

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

How to convert PDF to text with Nanonetsnaman860154

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

A Year of the Servo Reboot: Where Are We Now?Igalia

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Artificial Intelligence: Facts and MythsJoaquim Jorge

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Dernier (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Data Cloud, More than a CDP by Matt Robison

Driving Behavioral Change for Information Management through Data-Driven Gree...

GenCyber Cyber Security Day Presentation

Handwritten Text Recognition for manuscripts and early printed texts

How to convert PDF to text with Nanonets

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

CNv6 Instructor Chapter 6 Quality of Service

A Year of the Servo Reboot: Where Are We Now?

08448380779 Call Girls In Friends Colony Women Seeking Men

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Scaling API-first – The story of a global engineering organization

Presentation on how to chat with PDF using ChatGPT code interpreter

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Artificial Intelligence: Facts and Myths

How to Troubleshoot Apps for the Modern Connected Worker

Exploring the Future Potential of AI-Enabled Smartphone Processors

What Are The Drone Anti-jamming Systems Technology?

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Data stream classification by incremental semi-supervised fuzzy clustering

1. Data stream classiﬁcation by incremental semi-supervised fuzzy clustering G.Casalino, G. Castellano, C.Castiello, A.M.Fanelli, C. Mencar CVPL2018 gabriella.casalino@uniba.it

2. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classification by incremental semi- supervised fuzzy clustering Data streams • Continuous flow of data • sensors, online transactions, health monitoring, network traffic,… • Impractical to store and use all data • Need of new techniques that: • Process a finite number of data at a time • Use a limited amount of memory • Predict/classify at any time and in a limited amount of time • Take into account the evolution of data

3. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classification by incremental semi- supervised fuzzy clustering Proposed method • DISSFCM: Dynamic Incremental Semi-Supervised Fuzzy C-Means • a method for data stream classification that • works in an incremental way • dynamically adapts the number of clusters: • a fixed number of clusters may not capture adequately the evolving structure of streaming data • uses unlabeled and labeled data, semi-supervised • uses fuzzy logic to describe patterns in data

4. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classiﬁcation by incremental semi- supervised fuzzy clustering Proposed method • Based on semi-supervised fuzzy clustering algorithm • Applied to subsequent, non-overlapping chunks of data so as to enable continuous update of clusters • SSFCM - Semi-Supervised FCM (Pedrycz and Waletzky, 1997) Supervised component

5. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classiﬁcation by incremental semi- supervised fuzzy clustering Split • When the cluster quality deteriorates from one data chunk to another, the number of clusters is increased (by splitting some clusters) • The cluster quality is evaluated in terms of the reconstruction error (Pedrycz, 2008) • The cluster having the highest value of the reconstruction error is splitted in two clusters • To ﬁnd the new two prototypes a conditional fuzzy clustering is applied to the data belonging to the cluster

6. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classification by incremental semi- supervised fuzzy clustering Merge • The two nearest clusters sharing the same prototype’s label are merged in one if: • the number of clusters exceeds a predefined threshold • the number of data belonging to a cluster is below a predefined threshold

7. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classiﬁcation by incremental semi- supervised fuzzy clustering DISSFCM

8. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classiﬁcation by incremental semi- supervised fuzzy clustering Experimental results • Optical recognition of Handwritten Digits dataset • 5620 samples, 10 classes • Training set: 90%, Test set: 10% • #Chunk: 5,10,15,20 • %Labeling: 75% • Splitting tolerance: 25, 50, 100 • Evaluation measure: classiﬁcation accuracy

9. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classiﬁcation by incremental semi- supervised fuzzy clustering Trend of the reconstruction error #Chunk=20, %Labeling=75%, SplitTol=25

10. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classiﬁcation by incremental semi- supervised fuzzy clustering Accuracy values #Chunk=5 #Chunk=10 #Chunk=15 #Chunk=20

11. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classiﬁcation by incremental semi- supervised fuzzy clustering Conclusions • DISSFCM • learn incrementally from data • adapt the number of cluster • inject a-priori knowledge in the process • Future work: • the merge activation conditions • the inﬂuence of the chunk composition • a mechanism to detect outliers, concept drift and the emergence of new classes.

12. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classiﬁcation by incremental semi- supervised fuzzy clustering http://www.di.uniba.it/~cilab

Data stream classification by incremental semi-supervised fuzzy clustering

Recommandé

Recommandé

Contenu connexe

Similaire à Data stream classification by incremental semi-supervised fuzzy clustering

Similaire à Data stream classification by incremental semi-supervised fuzzy clustering (20)

Plus de Gabriella Casalino

Plus de Gabriella Casalino (11)

Dernier

Dernier (20)

Data stream classification by incremental semi-supervised fuzzy clustering