SlideShare une entreprise Scribd logo
1  sur  27
Evaluating classification algorithms applied to data streams Author: Ing. Esteban  D. Donato Advisor: Dr. Fazel Famili Co-Advisor: Dra. Ana S. Haedo Dec-2009   Maestría  en  Explotación de Datos y Descubrimiento del Conocimiento
Introduction ,[object Object],[object Object],[object Object]
Objective ,[object Object],[object Object],[object Object]
Related work  ,[object Object],[object Object],[object Object]
Related work (Cont.):  Data Streams Mining ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Related work (Cont.):  Very Fast Decision Tree (VFDT) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Related work (Cont.):  Concept Drift ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusion  of literature review ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Algorithm: VFDTc ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Algorithm:  UFFT ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Algorithm: CVFDT ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Performance measures ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data sets generated ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data sets generated Dataset with no concept drift, outlier of noise Dataset with 10% of noisy data Dataset with 1% of outliers Dataset with 3 concept drift
Results Capacity to detect and respond to concept drift
Results Capacity to detect and respond to virtual concept drift
Results Capacity to detect and respond to recurring concept drift
Results Capacity to adapt to sudden concept drift
Results Capacity to adapt to gradual concept drift
Results Capacity to adapt to frequent concept drift
Results Accuracy of the classification task VFDTc (CA) VFDTc (EBP) UFFT CVFDT measures derived from the confusion matrix     Predicted Predicted     Class 1 Class 2 Actual Class 1 44.5% (887) 5.5% (109) Actual Class 2 5% (101) 45% (903)     Predicted Predicted     Class 1 Class 2 Actual Class 1 39% (777) 11% (219) Actual Class 2 9% (173) 41% (831)     Predicted Predicted     Class 1 Class 2 Actual Class 1 46% (928) 3.5% (68) Actual Class 2 2.5% (48) 48% (956)     Predicted Predicted     Class 1 Class 2 Actual Class 1 34.5% (685) 15.5% (311) Actual Class 2 15.5% (312) 34.5% (692)   Accuracy (AC) True positive (TP) False Positive (FP) True Negative (TN) False Negative (FN) Precision (P) VFDTc (CA) 0.89 0.89 0.10 0.90 0.11 0.90 VFDTc (EBP) 0.80 0.78 0.17 0.83 0.22 0.82 UFFT 0.94 0.93 0.05 0.95 0.07 0.95 CVFDT 0.69 0.69 0.31 0.69 0.31 0.69
Results Dealing with outliers
Results Dealing with noisy data
Results Speed (Time to take to process an item in the stream)
Conclusions & future work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusions & future work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],E-mail:  [email_address] Twitter: @eddonato

Contenu connexe

Tendances

Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisVincenzo Gulisano
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream ProcessingZbigniew Jerzak
 
Mahoney mlconf-nov13
Mahoney mlconf-nov13Mahoney mlconf-nov13
Mahoney mlconf-nov13MLconf
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slidesMLconf
 
Moa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsAlbert Bifet
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series dataKrish_ver2
 
Introduction to neural networks and Keras
Introduction to neural networks and KerasIntroduction to neural networks and Keras
Introduction to neural networks and KerasJie He
 
IJSETR-VOL-3-ISSUE-12-3358-3363
IJSETR-VOL-3-ISSUE-12-3358-3363IJSETR-VOL-3-ISSUE-12-3358-3363
IJSETR-VOL-3-ISSUE-12-3358-3363SHIVA REDDY
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsSrinath Perera
 
Elag 2012 - Under the hood of 3TU.Datacentrum.
Elag 2012 - Under the hood of 3TU.Datacentrum.Elag 2012 - Under the hood of 3TU.Datacentrum.
Elag 2012 - Under the hood of 3TU.Datacentrum.Egbert Gramsbergen
 
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...AshishDPatel1
 
Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesVarad Meru
 
Basic ideas on keras framework
Basic ideas on keras frameworkBasic ideas on keras framework
Basic ideas on keras frameworkAlison Marczewski
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkAlpine Data
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...Thanh Hieu
 
Efficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersEfficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersAlbert Bifet
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
 

Tendances (20)

Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data Analysis
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream Processing
 
Mahoney mlconf-nov13
Mahoney mlconf-nov13Mahoney mlconf-nov13
Mahoney mlconf-nov13
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slides
 
Moa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data Streams
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
 
Introduction to neural networks and Keras
Introduction to neural networks and KerasIntroduction to neural networks and Keras
Introduction to neural networks and Keras
 
IJSETR-VOL-3-ISSUE-12-3358-3363
IJSETR-VOL-3-ISSUE-12-3358-3363IJSETR-VOL-3-ISSUE-12-3358-3363
IJSETR-VOL-3-ISSUE-12-3358-3363
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics Patterns
 
Temporal data mining
Temporal data miningTemporal data mining
Temporal data mining
 
Elag 2012 - Under the hood of 3TU.Datacentrum.
Elag 2012 - Under the hood of 3TU.Datacentrum.Elag 2012 - Under the hood of 3TU.Datacentrum.
Elag 2012 - Under the hood of 3TU.Datacentrum.
 
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
 
Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensembles
 
Os
OsOs
Os
 
Basic ideas on keras framework
Basic ideas on keras frameworkBasic ideas on keras framework
Basic ideas on keras framework
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Final proj 2 (1)
Final proj 2 (1)Final proj 2 (1)
Final proj 2 (1)
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...
 
Efficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersEfficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream Classifiers
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 

En vedette

My ontology is better than yours! Building and evaluating ontologies for inte...
My ontology is better than yours! Building and evaluating ontologies for inte...My ontology is better than yours! Building and evaluating ontologies for inte...
My ontology is better than yours! Building and evaluating ontologies for inte...Robert Hoehndorf
 
Advanced Practice Nursing and Research
Advanced Practice Nursing and ResearchAdvanced Practice Nursing and Research
Advanced Practice Nursing and Researchbodo-con
 
Handling concept drift in data stream mining
Handling concept drift in data stream miningHandling concept drift in data stream mining
Handling concept drift in data stream miningManuel Martín
 
Integrative research and development: workspaces
Integrative research and development: workspacesIntegrative research and development: workspaces
Integrative research and development: workspacesSarah Lee
 
Acn research and nursing profession
Acn research and nursing professionAcn research and nursing profession
Acn research and nursing professionSanil Varghese
 
EVALUATION OF PERFORMANCE & QUALITY
EVALUATION OF PERFORMANCE & QUALITY  EVALUATION OF PERFORMANCE & QUALITY
EVALUATION OF PERFORMANCE & QUALITY Sana Saiyed
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream MiningAlbert Bifet
 
Integrative Review of Factors Associated with the Willingness of Health Care ...
Integrative Review of Factors Associated with the Willingness of Health Care ...Integrative Review of Factors Associated with the Willingness of Health Care ...
Integrative Review of Factors Associated with the Willingness of Health Care ...Global Risk Forum GRFDavos
 
Research in nursing practice revision
Research in nursing practice   revisionResearch in nursing practice   revision
Research in nursing practice revisionbsunilsilva
 

En vedette (13)

My ontology is better than yours! Building and evaluating ontologies for inte...
My ontology is better than yours! Building and evaluating ontologies for inte...My ontology is better than yours! Building and evaluating ontologies for inte...
My ontology is better than yours! Building and evaluating ontologies for inte...
 
Advanced Practice Nursing and Research
Advanced Practice Nursing and ResearchAdvanced Practice Nursing and Research
Advanced Practice Nursing and Research
 
Handling concept drift in data stream mining
Handling concept drift in data stream miningHandling concept drift in data stream mining
Handling concept drift in data stream mining
 
NYP EBP Cohort 8 Under Pressure
NYP EBP Cohort 8 Under PressureNYP EBP Cohort 8 Under Pressure
NYP EBP Cohort 8 Under Pressure
 
Integrative research and development: workspaces
Integrative research and development: workspacesIntegrative research and development: workspaces
Integrative research and development: workspaces
 
Acn research and nursing profession
Acn research and nursing professionAcn research and nursing profession
Acn research and nursing profession
 
EVALUATION OF PERFORMANCE & QUALITY
EVALUATION OF PERFORMANCE & QUALITY  EVALUATION OF PERFORMANCE & QUALITY
EVALUATION OF PERFORMANCE & QUALITY
 
Primary Health Care
Primary Health CarePrimary Health Care
Primary Health Care
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream Mining
 
Systematic review
Systematic reviewSystematic review
Systematic review
 
Quality assurance in nursing
Quality assurance in nursingQuality assurance in nursing
Quality assurance in nursing
 
Integrative Review of Factors Associated with the Willingness of Health Care ...
Integrative Review of Factors Associated with the Willingness of Health Care ...Integrative Review of Factors Associated with the Willingness of Health Care ...
Integrative Review of Factors Associated with the Willingness of Health Care ...
 
Research in nursing practice revision
Research in nursing practice   revisionResearch in nursing practice   revision
Research in nursing practice revision
 

Similaire à Evaluating Classification Algorithms Applied To Data Streams Esteban Donato

Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataDatamining Tools
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataDataminingTools Inc
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor NetworksOscar Corcho
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral ResearchPo-Ting Wu
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Mumbai Academisc
 
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeBig data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeItai Yaffe
 
Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach  Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach IJECEIAES
 
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsDSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsAllenWu
 
REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENT
REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENTREAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENT
REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENTcscpconf
 
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment csandit
 
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...IJERA Editor
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersUniversity of Huddersfield
 
Comparative study of optimization algorithms on convolutional network for aut...
Comparative study of optimization algorithms on convolutional network for aut...Comparative study of optimization algorithms on convolutional network for aut...
Comparative study of optimization algorithms on convolutional network for aut...IJECEIAES
 
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...mlaij
 
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...mlaij
 
And Then There Are Algorithms
And Then There Are AlgorithmsAnd Then There Are Algorithms
And Then There Are AlgorithmsInfluxData
 
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...Ganesan Narayanasamy
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilitiesIan Foster
 

Similaire à Evaluating Classification Algorithms Applied To Data Streams Esteban Donato (20)

Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor Networks
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral Research
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
 
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeBig data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real time
 
Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach  Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach
 
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsDSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
 
REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENT
REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENTREAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENT
REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENT
 
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
 
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
An Efficient Frame Embedding Using Haar Wavelet Coefficients And Orthogonal C...
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Comparative study of optimization algorithms on convolutional network for aut...
Comparative study of optimization algorithms on convolutional network for aut...Comparative study of optimization algorithms on convolutional network for aut...
Comparative study of optimization algorithms on convolutional network for aut...
 
ifip2008albashiri.pdf
ifip2008albashiri.pdfifip2008albashiri.pdf
ifip2008albashiri.pdf
 
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
 
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
 
And Then There Are Algorithms
And Then There Are AlgorithmsAnd Then There Are Algorithms
And Then There Are Algorithms
 
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
 

Evaluating Classification Algorithms Applied To Data Streams Esteban Donato

  • 1. Evaluating classification algorithms applied to data streams Author: Ing. Esteban D. Donato Advisor: Dr. Fazel Famili Co-Advisor: Dra. Ana S. Haedo Dec-2009 Maestría en Explotación de Datos y Descubrimiento del Conocimiento
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. Data sets generated Dataset with no concept drift, outlier of noise Dataset with 10% of noisy data Dataset with 1% of outliers Dataset with 3 concept drift
  • 15. Results Capacity to detect and respond to concept drift
  • 16. Results Capacity to detect and respond to virtual concept drift
  • 17. Results Capacity to detect and respond to recurring concept drift
  • 18. Results Capacity to adapt to sudden concept drift
  • 19. Results Capacity to adapt to gradual concept drift
  • 20. Results Capacity to adapt to frequent concept drift
  • 21. Results Accuracy of the classification task VFDTc (CA) VFDTc (EBP) UFFT CVFDT measures derived from the confusion matrix     Predicted Predicted     Class 1 Class 2 Actual Class 1 44.5% (887) 5.5% (109) Actual Class 2 5% (101) 45% (903)     Predicted Predicted     Class 1 Class 2 Actual Class 1 39% (777) 11% (219) Actual Class 2 9% (173) 41% (831)     Predicted Predicted     Class 1 Class 2 Actual Class 1 46% (928) 3.5% (68) Actual Class 2 2.5% (48) 48% (956)     Predicted Predicted     Class 1 Class 2 Actual Class 1 34.5% (685) 15.5% (311) Actual Class 2 15.5% (312) 34.5% (692)   Accuracy (AC) True positive (TP) False Positive (FP) True Negative (TN) False Negative (FN) Precision (P) VFDTc (CA) 0.89 0.89 0.10 0.90 0.11 0.90 VFDTc (EBP) 0.80 0.78 0.17 0.83 0.22 0.82 UFFT 0.94 0.93 0.05 0.95 0.07 0.95 CVFDT 0.69 0.69 0.31 0.69 0.31 0.69
  • 23. Results Dealing with noisy data
  • 24. Results Speed (Time to take to process an item in the stream)
  • 25.
  • 26.
  • 27.