SlideShare une entreprise Scribd logo
1  sur  12
Télécharger pour lire hors ligne
Data stream classification
by incremental
semi-supervised fuzzy clustering
G.Casalino, G. Castellano, C.Castiello, A.M.Fanelli, C. Mencar
CVPL2018
gabriella.casalino@uniba.it
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
Data streams
• Continuous flow of data
• sensors, online transactions, health monitoring, network traffic,…
• Impractical to store and use all data
• Need of new techniques that:
• Process a finite number of data at a time
• Use a limited amount of memory
• Predict/classify at any time and in a limited amount of time
• Take into account the evolution of data
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
Proposed method
• DISSFCM: Dynamic Incremental Semi-Supervised Fuzzy C-Means
• a method for data stream classification that
• works in an incremental way
• dynamically adapts the number of clusters:
• a fixed number of clusters may not capture adequately the evolving
structure of streaming data
• uses unlabeled and labeled data, semi-supervised
• uses fuzzy logic to describe patterns in data
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
Proposed method
• Based on semi-supervised fuzzy clustering
algorithm
• Applied to subsequent, non-overlapping chunks of
data so as to enable continuous update of clusters
• SSFCM - Semi-Supervised FCM (Pedrycz and
Waletzky, 1997)
Supervised component
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
Split
• When the cluster quality deteriorates from one data
chunk to another, the number of clusters is
increased (by splitting some clusters)
• The cluster quality is evaluated in terms of the
reconstruction error (Pedrycz, 2008)
• The cluster having the highest value of the
reconstruction error is splitted in two clusters
• To find the new two prototypes a conditional fuzzy
clustering is applied to the data belonging to the cluster
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
Merge
• The two nearest clusters sharing the same
prototype’s label are merged in one if:
• the number of clusters exceeds a predefined threshold
• the number of data belonging to a cluster is below a
predefined threshold
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
DISSFCM
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
Experimental results
• Optical recognition of Handwritten Digits dataset
• 5620 samples, 10 classes
• Training set: 90%, Test set: 10%
• #Chunk: 5,10,15,20
• %Labeling: 75%
• Splitting tolerance: 25, 50, 100
• Evaluation measure: classification accuracy
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
Trend of the reconstruction
error
#Chunk=20, %Labeling=75%, SplitTol=25
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
Accuracy values
#Chunk=5 #Chunk=10
#Chunk=15 #Chunk=20
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
Conclusions
• DISSFCM
• learn incrementally from data
• adapt the number of cluster
• inject a-priori knowledge in the process
• Future work:
• the merge activation conditions
• the influence of the chunk composition
• a mechanism to detect outliers, concept drift and the emergence of
new classes.
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classification by incremental semi-
supervised fuzzy clustering
http://www.di.uniba.it/~cilab

Contenu connexe

Similaire à Data stream classification by incremental semi-supervised fuzzy clustering

Qu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQuantUniversity
 
The benefits of fine-grained synchronization in deterministic and efficient ...
The benefits of fine-grained synchronization in  deterministic and efficient ...The benefits of fine-grained synchronization in  deterministic and efficient ...
The benefits of fine-grained synchronization in deterministic and efficient ...Vincenzo Gulisano
 
Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...
Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...
Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...Yves Sucaet
 
Grouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big DataGrouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big DataFacultad de Informática UCM
 
Mining System Logs to Learn Error Predictors, Universität Stuttgart, Stuttgar...
Mining System Logs to Learn Error Predictors, Universität Stuttgart, Stuttgar...Mining System Logs to Learn Error Predictors, Universität Stuttgart, Stuttgar...
Mining System Logs to Learn Error Predictors, Universität Stuttgart, Stuttgar...Barbara Russo
 
Big data and macroeconomic nowcasting from data access to modelling
Big data and macroeconomic nowcasting from data access to modellingBig data and macroeconomic nowcasting from data access to modelling
Big data and macroeconomic nowcasting from data access to modellingDario Buono
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxSaiPragnaKancheti
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxSaiPragnaKancheti
 
SEBD2015_PresentationVitali
SEBD2015_PresentationVitaliSEBD2015_PresentationVitali
SEBD2015_PresentationVitaliMonica Vitali
 
ProFAX: a hardware acceleration of a protein folding algorithm
ProFAX: a hardware acceleration of a protein folding algorithmProFAX: a hardware acceleration of a protein folding algorithm
ProFAX: a hardware acceleration of a protein folding algorithmNECST Lab @ Politecnico di Milano
 
XploreIQ: Machine Learning and Big Data The Successful Use of Algorithms in E...
XploreIQ: Machine Learning and Big Data The Successful Use of Algorithms in E...XploreIQ: Machine Learning and Big Data The Successful Use of Algorithms in E...
XploreIQ: Machine Learning and Big Data The Successful Use of Algorithms in E...SGS
 
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE cscpconf
 
Big&open data challenges for smartcity-PIC2014 Shanghai
Big&open data challenges for smartcity-PIC2014 ShanghaiBig&open data challenges for smartcity-PIC2014 Shanghai
Big&open data challenges for smartcity-PIC2014 ShanghaiVictoria López
 
Democratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn CreatorDemocratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn CreatorDatabricks
 

Similaire à Data stream classification by incremental semi-supervised fuzzy clustering (20)

Qu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in Finance
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
The benefits of fine-grained synchronization in deterministic and efficient ...
The benefits of fine-grained synchronization in  deterministic and efficient ...The benefits of fine-grained synchronization in  deterministic and efficient ...
The benefits of fine-grained synchronization in deterministic and efficient ...
 
"Agro-Market Prediction by Fuzzy based Neuro-Genetic Algorithm"
"Agro-Market Prediction by Fuzzy based Neuro-Genetic Algorithm""Agro-Market Prediction by Fuzzy based Neuro-Genetic Algorithm"
"Agro-Market Prediction by Fuzzy based Neuro-Genetic Algorithm"
 
ReComp for genomics
ReComp for genomicsReComp for genomics
ReComp for genomics
 
Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...
Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...
Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...
 
Grouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big DataGrouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big Data
 
Mining System Logs to Learn Error Predictors, Universität Stuttgart, Stuttgar...
Mining System Logs to Learn Error Predictors, Universität Stuttgart, Stuttgar...Mining System Logs to Learn Error Predictors, Universität Stuttgart, Stuttgar...
Mining System Logs to Learn Error Predictors, Universität Stuttgart, Stuttgar...
 
Big data and macroeconomic nowcasting from data access to modelling
Big data and macroeconomic nowcasting from data access to modellingBig data and macroeconomic nowcasting from data access to modelling
Big data and macroeconomic nowcasting from data access to modelling
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
 
Pride Cluster 062016 Update
Pride Cluster 062016 UpdatePride Cluster 062016 Update
Pride Cluster 062016 Update
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
SEBD2015_PresentationVitali
SEBD2015_PresentationVitaliSEBD2015_PresentationVitali
SEBD2015_PresentationVitali
 
ProFAX: a hardware acceleration of a protein folding algorithm
ProFAX: a hardware acceleration of a protein folding algorithmProFAX: a hardware acceleration of a protein folding algorithm
ProFAX: a hardware acceleration of a protein folding algorithm
 
XploreIQ: Machine Learning and Big Data The Successful Use of Algorithms in E...
XploreIQ: Machine Learning and Big Data The Successful Use of Algorithms in E...XploreIQ: Machine Learning and Big Data The Successful Use of Algorithms in E...
XploreIQ: Machine Learning and Big Data The Successful Use of Algorithms in E...
 
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
 
Big&open data challenges for smartcity-PIC2014 Shanghai
Big&open data challenges for smartcity-PIC2014 ShanghaiBig&open data challenges for smartcity-PIC2014 Shanghai
Big&open data challenges for smartcity-PIC2014 Shanghai
 
Democratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn CreatorDemocratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn Creator
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 

Plus de Gabriella Casalino

A mHealth solution for contact-less self-monitoring of vital sign parameters
A mHealth solution for contact-less self-monitoring of vital sign parametersA mHealth solution for contact-less self-monitoring of vital sign parameters
A mHealth solution for contact-less self-monitoring of vital sign parametersGabriella Casalino
 
Text mining through Non Negative Matrix Factorizations
Text mining through Non Negative Matrix FactorizationsText mining through Non Negative Matrix Factorizations
Text mining through Non Negative Matrix FactorizationsGabriella Casalino
 
Dynamic Incremental Semi-supervised Fuzzy Clustering for Bipolar Disorder Epi...
Dynamic Incremental Semi-supervised Fuzzy Clustering for Bipolar Disorder Epi...Dynamic Incremental Semi-supervised Fuzzy Clustering for Bipolar Disorder Epi...
Dynamic Incremental Semi-supervised Fuzzy Clustering for Bipolar Disorder Epi...Gabriella Casalino
 
A mHealth solution for contact-less self-monitoring of vital signs parameters
A mHealth solution for contact-less  self-monitoring of vital signs parametersA mHealth solution for contact-less  self-monitoring of vital signs parameters
A mHealth solution for contact-less self-monitoring of vital signs parametersGabriella Casalino
 
Incremental and Adaptive fuzzy clustering for Virtual Learning Environments D...
Incremental and Adaptive fuzzy clustering for Virtual Learning Environments D...Incremental and Adaptive fuzzy clustering for Virtual Learning Environments D...
Incremental and Adaptive fuzzy clustering for Virtual Learning Environments D...Gabriella Casalino
 
The use of an Explainable Artificial Intelligence Tool for Decision-making Su...
The use of an Explainable Artificial Intelligence Tool for Decision-making Su...The use of an Explainable Artificial Intelligence Tool for Decision-making Su...
The use of an Explainable Artificial Intelligence Tool for Decision-making Su...Gabriella Casalino
 
Non-negative factorization methods for extracting semantically relevant featu...
Non-negative factorization methods for extracting semantically relevant featu...Non-negative factorization methods for extracting semantically relevant featu...
Non-negative factorization methods for extracting semantically relevant featu...Gabriella Casalino
 

Plus de Gabriella Casalino (11)

IJCCI2023.pdf
IJCCI2023.pdfIJCCI2023.pdf
IJCCI2023.pdf
 
A mHealth solution for contact-less self-monitoring of vital sign parameters
A mHealth solution for contact-less self-monitoring of vital sign parametersA mHealth solution for contact-less self-monitoring of vital sign parameters
A mHealth solution for contact-less self-monitoring of vital sign parameters
 
Text mining through Non Negative Matrix Factorizations
Text mining through Non Negative Matrix FactorizationsText mining through Non Negative Matrix Factorizations
Text mining through Non Negative Matrix Factorizations
 
Dynamic Incremental Semi-supervised Fuzzy Clustering for Bipolar Disorder Epi...
Dynamic Incremental Semi-supervised Fuzzy Clustering for Bipolar Disorder Epi...Dynamic Incremental Semi-supervised Fuzzy Clustering for Bipolar Disorder Epi...
Dynamic Incremental Semi-supervised Fuzzy Clustering for Bipolar Disorder Epi...
 
A mHealth solution for contact-less self-monitoring of vital signs parameters
A mHealth solution for contact-less  self-monitoring of vital signs parametersA mHealth solution for contact-less  self-monitoring of vital signs parameters
A mHealth solution for contact-less self-monitoring of vital signs parameters
 
Incremental and Adaptive fuzzy clustering for Virtual Learning Environments D...
Incremental and Adaptive fuzzy clustering for Virtual Learning Environments D...Incremental and Adaptive fuzzy clustering for Virtual Learning Environments D...
Incremental and Adaptive fuzzy clustering for Virtual Learning Environments D...
 
The use of an Explainable Artificial Intelligence Tool for Decision-making Su...
The use of an Explainable Artificial Intelligence Tool for Decision-making Su...The use of an Explainable Artificial Intelligence Tool for Decision-making Su...
The use of an Explainable Artificial Intelligence Tool for Decision-making Su...
 
Non-negative factorization methods for extracting semantically relevant featu...
Non-negative factorization methods for extracting semantically relevant featu...Non-negative factorization methods for extracting semantically relevant featu...
Non-negative factorization methods for extracting semantically relevant featu...
 
ICCSA2014 - slides
ICCSA2014 - slidesICCSA2014 - slides
ICCSA2014 - slides
 
Didamatica2012 - slides
Didamatica2012 - slidesDidamatica2012 - slides
Didamatica2012 - slides
 
WILF2011 - slides
WILF2011 - slidesWILF2011 - slides
WILF2011 - slides
 

Dernier

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Dernier (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Data stream classification by incremental semi-supervised fuzzy clustering

  • 1. Data stream classification by incremental semi-supervised fuzzy clustering G.Casalino, G. Castellano, C.Castiello, A.M.Fanelli, C. Mencar CVPL2018 gabriella.casalino@uniba.it
  • 2. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classification by incremental semi- supervised fuzzy clustering Data streams • Continuous flow of data • sensors, online transactions, health monitoring, network traffic,… • Impractical to store and use all data • Need of new techniques that: • Process a finite number of data at a time • Use a limited amount of memory • Predict/classify at any time and in a limited amount of time • Take into account the evolution of data
  • 3. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classification by incremental semi- supervised fuzzy clustering Proposed method • DISSFCM: Dynamic Incremental Semi-Supervised Fuzzy C-Means • a method for data stream classification that • works in an incremental way • dynamically adapts the number of clusters: • a fixed number of clusters may not capture adequately the evolving structure of streaming data • uses unlabeled and labeled data, semi-supervised • uses fuzzy logic to describe patterns in data
  • 4. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classification by incremental semi- supervised fuzzy clustering Proposed method • Based on semi-supervised fuzzy clustering algorithm • Applied to subsequent, non-overlapping chunks of data so as to enable continuous update of clusters • SSFCM - Semi-Supervised FCM (Pedrycz and Waletzky, 1997) Supervised component
  • 5. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classification by incremental semi- supervised fuzzy clustering Split • When the cluster quality deteriorates from one data chunk to another, the number of clusters is increased (by splitting some clusters) • The cluster quality is evaluated in terms of the reconstruction error (Pedrycz, 2008) • The cluster having the highest value of the reconstruction error is splitted in two clusters • To find the new two prototypes a conditional fuzzy clustering is applied to the data belonging to the cluster
  • 6. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classification by incremental semi- supervised fuzzy clustering Merge • The two nearest clusters sharing the same prototype’s label are merged in one if: • the number of clusters exceeds a predefined threshold • the number of data belonging to a cluster is below a predefined threshold
  • 7. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classification by incremental semi- supervised fuzzy clustering DISSFCM
  • 8. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classification by incremental semi- supervised fuzzy clustering Experimental results • Optical recognition of Handwritten Digits dataset • 5620 samples, 10 classes • Training set: 90%, Test set: 10% • #Chunk: 5,10,15,20 • %Labeling: 75% • Splitting tolerance: 25, 50, 100 • Evaluation measure: classification accuracy
  • 9. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classification by incremental semi- supervised fuzzy clustering Trend of the reconstruction error #Chunk=20, %Labeling=75%, SplitTol=25
  • 10. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classification by incremental semi- supervised fuzzy clustering Accuracy values #Chunk=5 #Chunk=10 #Chunk=15 #Chunk=20
  • 11. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classification by incremental semi- supervised fuzzy clustering Conclusions • DISSFCM • learn incrementally from data • adapt the number of cluster • inject a-priori knowledge in the process • Future work: • the merge activation conditions • the influence of the chunk composition • a mechanism to detect outliers, concept drift and the emergence of new classes.
  • 12. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classification by incremental semi- supervised fuzzy clustering http://www.di.uniba.it/~cilab