SlideShare a Scribd company logo
1 of 18
Data Mining Technique For Classification and
Feature Evaluation Using Stream Mining

Ranjit R. Banshpal
OUTLINE
•Introduction
•Data streams classification
•Decision Tree
•VFDT
•Challenges
•Applications
•Conclusion
•References
Introduction
• What is Data mining ?
• Extracting knowledge from historical data.
• What is Data stream Mining ?
• Extracting knowledge from real high stream data
• Why we use Data stream Mining ?
Introduction (Cont…)
Examples:

Continue flow Data

Network Traffic Data

Sensor Data

Call Center Data
Data Stream Classification
• Uses past labeled data to build classification model
• Predicts the labels of future instances using the model
• Helps decision making
Expert

analysis
and
labeling

Block and
quarantine
Network traffic
Attack traffic

Firewall
Classification
model

M

e
od

a
pd
lu

te

Benign traffic
Server
5
Decision Trees
• Decision tree is a classification model. Its
structure is a like a general tree structure or flow
chart.
– Internal node: It is used for testing the attribute
value.
– Leaf node: class labels.

Fig: Decision Tree of Weather
Decision Tree (cont...)
• Limitations
– Classic decision tree assume all training data
can be simultaneously stored in main
memory.
– Disk-based decision tree repeatedly read
training data from disk sequentially.
VFDT
• VFDT takes less time as compare to Decision tree.
• In order to find the best attribute at a node, it will take small
subset of the training examples that pass through that node.

– Given a stream of examples, use the first ones to
choose the root attribute.
– Once the root attribute is chosen, the successive
examples are passed down to the corresponding
leaves, and used to choose the attribute there, and
so on recursively.
VFDT (cont...)
Age<30?
Yes

No

Data Stream

Yes
_

_

G(Car Type) - G(Gender) > ε
Age<30?
Yes

No

Car Type=
Sports Car?

Car Type=
normal

Yes
No

No

Data Stream
Challenges
• Infinite length
• Concept-drift
• Concept-evolution
• Feature Evolution
The data stream is divided into equal sized chunks
(Input)
algorithm
Buffer
outliers instances.

outlier detection module
classifier Ensemble M

If tp is greater

clusters clusters
clusters
Clusters
instances in
Buffer

cluster is
Transformed
into a
pseudopoint
data
structure

corresponding
classifier votes
in favor
of a another
class

than the threshold

Set of Pseudopoint H
Centroid,Weight,radiu
s
Centroid,Weight,radius
Centroid,Weight,radius
Centroid,Weight,radius

Another instance

Calculate q-NSC value
Assigned to every instance in Pseudopoint
Fig: Work flow for Identifying concept evolution.
Feature-Evolution
Applications
•Applicable to many domains such as
•Intrusion detection system.
•Share Market Data.
•Security Monitoring.
•Network monitoring and traffic engineering.
•Business : credit card transaction flows.
•Telecommunication calling records.
•Web logs and web page click streams.
Conclusion
• In data stream classification VFDT algorithm is efficient to
classified high dimensional data in to the another class.
• Then, VFDT shows two key mechanisms of the another class
detection technique, outlier detection, and multiple class
detection.
References
[1] Mohammad M. Masud, Qing Chen, Latifur Khan, Charu C. Aggarwal, JingGao,
Jiawei Han, “Classification and Adaptive Novel Class Detection of Feature-Evolving
Data Streams”, IEEE Tran. on Knowledge And Data Engi., Vol. 25, No. 7, July 2013.
[2] Durga Toshniwal, Yogita K,“Clustering Techniques for Streaming Data–A
Survey”, 3rd IEEE International Advance Computing Conference (IACC), 2013.
[3] S. Hashemi, Y. Yang, Z. Mirzamomen, and M. Kangavari, “Adapted One-versusAll Decision Trees for Data Stream Classi-fication,” IEEE Trans. Knowledge and
Data Eng., vol. 21, no. 5, pp. 624-637, May 2012.
[4] A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavalda,“New Ensemble
Methods for Evolving Data Streams,” Proc. ACMSIGKDD 15th Int’l Conf.
Knowledge Discovery and Data Mining,pp. 139-148, 2011.
References
[5] C.C. Aggarwal, “On Classification and Segmentation of Massive Audio Data
Streams,” Knowledge and Information System, vol. 20, pp. 137-156, July 2009.
[6] M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classification
and Novel Class Detection in Concept-Drifting Data Streams under Time
Constraints,” IEEE Trans. Knowledge and Data Eng.,vol. 23, no. 6, pp. 859-874,
June 2011.
[7] M.M. Masud, Q. Chen, L. Khan, C. Aggarwal, J. Gao, J. Han, and B.M.
Thuraisingham, “Addressing Concept-Evolution in Concept-Drifting Data Streams,”
Proc. IEEE Int’l Conf. Data Mining (ICDM), pp. 929-934, 2010.
[8] M.-Y. Yeh, B.-R. Dai, and M.-S. Chen, “Clustering over multiple evolving
streams by events and correlations,” IEEE Trans. on Knowl. and Data Eng., vol. 19,
no. 10, pp. 1349–1362, Oct. 2009
Any
Questions?
THANK YOU

More Related Content

What's hot

Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641
Aiswaryadevi Jaganmohan
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.doc
butest
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Slideshare
 
data warehousing & minining 1st unit
data warehousing & minining 1st unitdata warehousing & minining 1st unit
data warehousing & minining 1st unit
bhagathk
 
report.doc
report.docreport.doc
report.doc
butest
 
Associative Classification: Synopsis
Associative Classification: SynopsisAssociative Classification: Synopsis
Associative Classification: Synopsis
Jagdeep Singh Malhi
 
10 Algorithms in data mining
10 Algorithms in data mining10 Algorithms in data mining
10 Algorithms in data mining
George Ang
 
Mining frequent patterns association
Mining frequent patterns associationMining frequent patterns association
Mining frequent patterns association
DeepaR42
 

What's hot (20)

Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641
 
Data Reduction
Data ReductionData Reduction
Data Reduction
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.doc
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
data warehousing & minining 1st unit
data warehousing & minining 1st unitdata warehousing & minining 1st unit
data warehousing & minining 1st unit
 
Descriptive Analytics: Data Reduction
 Descriptive Analytics: Data Reduction Descriptive Analytics: Data Reduction
Descriptive Analytics: Data Reduction
 
Primer on major data mining algorithms
Primer on major data mining algorithmsPrimer on major data mining algorithms
Primer on major data mining algorithms
 
Machine Learning Real Life Applications By Examples
Machine Learning Real Life Applications By ExamplesMachine Learning Real Life Applications By Examples
Machine Learning Real Life Applications By Examples
 
report.doc
report.docreport.doc
report.doc
 
Associative Classification: Synopsis
Associative Classification: SynopsisAssociative Classification: Synopsis
Associative Classification: Synopsis
 
Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
10 Algorithms in data mining
10 Algorithms in data mining10 Algorithms in data mining
10 Algorithms in data mining
 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
 
Mining frequent patterns association
Mining frequent patterns associationMining frequent patterns association
Mining frequent patterns association
 
Classification
ClassificationClassification
Classification
 

Viewers also liked

Streaming data mining
Streaming data miningStreaming data mining
Streaming data mining
Ankit Solanki
 

Viewers also liked (13)

5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
Handling concept drift in data stream mining
Handling concept drift in data stream miningHandling concept drift in data stream mining
Handling concept drift in data stream mining
 
Streaming data mining
Streaming data miningStreaming data mining
Streaming data mining
 
Neural networks
Neural networksNeural networks
Neural networks
 
Poet ( PROCESS OPERATIONAL EXCELLENCE TECHNIQUE)
Poet ( PROCESS OPERATIONAL EXCELLENCE TECHNIQUE)Poet ( PROCESS OPERATIONAL EXCELLENCE TECHNIQUE)
Poet ( PROCESS OPERATIONAL EXCELLENCE TECHNIQUE)
 
case based recommendation approach for market basket data
case based recommendation approach for market basket datacase based recommendation approach for market basket data
case based recommendation approach for market basket data
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
 
a novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool wekaa novel approach for breast cancer detection using data mining tool weka
a novel approach for breast cancer detection using data mining tool weka
 
22 Machine Learning Feature Selection
22 Machine Learning Feature Selection22 Machine Learning Feature Selection
22 Machine Learning Feature Selection
 
My Dissertation Defense
My Dissertation Defense My Dissertation Defense
My Dissertation Defense
 
Moa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data Streams
 
Pca ppt
Pca pptPca ppt
Pca ppt
 

Similar to Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
PadmajaLaksh
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical University
butest
 
BI Chapter 04.pdf business business business business
BI Chapter 04.pdf business business business businessBI Chapter 04.pdf business business business business
BI Chapter 04.pdf business business business business
JawaherAlbaddawi
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
butest
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
bhagathk
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SAC
webuploader
 
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedDMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
Johannes Hoppe
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
dataminers.ir
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
Phi Jack
 

Similar to Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal) (20)

Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdf
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 
Data Mining 101
Data Mining 101Data Mining 101
Data Mining 101
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical University
 
BI Chapter 04.pdf business business business business
BI Chapter 04.pdf business business business businessBI Chapter 04.pdf business business business business
BI Chapter 04.pdf business business business business
 
Talk
TalkTalk
Talk
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SAC
 
Project 0th Review
Project 0th ReviewProject 0th Review
Project 0th Review
 
Data Mining Application and Trends
Data Mining Application and TrendsData Mining Application and Trends
Data Mining Application and Trends
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedDMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
 
Data mininng trends
Data mininng trendsData mininng trends
Data mininng trends
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 

More from ranjit banshpal

using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviation
ranjit banshpal
 

More from ranjit banshpal (15)

Designing Hybrid Cryptosystem for Secure Transmission of Image Data using Bio...
Designing Hybrid Cryptosystem for Secure Transmission of Image Data using Bio...Designing Hybrid Cryptosystem for Secure Transmission of Image Data using Bio...
Designing Hybrid Cryptosystem for Secure Transmission of Image Data using Bio...
 
SECURE IMAGE RETRIEVAL BASED ON HYBRID FEATURES AND HASHES
SECURE IMAGE RETRIEVAL BASED ON HYBRID FEATURES AND HASHESSECURE IMAGE RETRIEVAL BASED ON HYBRID FEATURES AND HASHES
SECURE IMAGE RETRIEVAL BASED ON HYBRID FEATURES AND HASHES
 
Secure Image Retrieval based on Hybrid Features and Hashes
Secure Image Retrieval based on Hybrid Features and HashesSecure Image Retrieval based on Hybrid Features and Hashes
Secure Image Retrieval based on Hybrid Features and Hashes
 
LCT in day2 day life
LCT in day2 day lifeLCT in day2 day life
LCT in day2 day life
 
Fingerprint recognition
Fingerprint recognitionFingerprint recognition
Fingerprint recognition
 
“Web crawler”
“Web crawler”“Web crawler”
“Web crawler”
 
Parallelization using open mp
Parallelization using open mpParallelization using open mp
Parallelization using open mp
 
Face recognition technology
Face recognition technologyFace recognition technology
Face recognition technology
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviation
 
E mail image spam filtering techniques
E mail image spam filtering techniquesE mail image spam filtering techniques
E mail image spam filtering techniques
 
Hybrid encryption
Hybrid encryption Hybrid encryption
Hybrid encryption
 
Autocorrelators1
Autocorrelators1Autocorrelators1
Autocorrelators1
 
Static Networks
Static NetworksStatic Networks
Static Networks
 
Ranjitbanshpal
RanjitbanshpalRanjitbanshpal
Ranjitbanshpal
 
Ranjitbanshpal1
Ranjitbanshpal1Ranjitbanshpal1
Ranjitbanshpal1
 

Recently uploaded

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Recently uploaded (20)

ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 

Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

  • 1. Data Mining Technique For Classification and Feature Evaluation Using Stream Mining Ranjit R. Banshpal
  • 2. OUTLINE •Introduction •Data streams classification •Decision Tree •VFDT •Challenges •Applications •Conclusion •References
  • 3. Introduction • What is Data mining ? • Extracting knowledge from historical data. • What is Data stream Mining ? • Extracting knowledge from real high stream data • Why we use Data stream Mining ?
  • 4. Introduction (Cont…) Examples: Continue flow Data Network Traffic Data Sensor Data Call Center Data
  • 5. Data Stream Classification • Uses past labeled data to build classification model • Predicts the labels of future instances using the model • Helps decision making Expert analysis and labeling Block and quarantine Network traffic Attack traffic Firewall Classification model M e od a pd lu te Benign traffic Server 5
  • 6. Decision Trees • Decision tree is a classification model. Its structure is a like a general tree structure or flow chart. – Internal node: It is used for testing the attribute value. – Leaf node: class labels. Fig: Decision Tree of Weather
  • 7. Decision Tree (cont...) • Limitations – Classic decision tree assume all training data can be simultaneously stored in main memory. – Disk-based decision tree repeatedly read training data from disk sequentially.
  • 8. VFDT • VFDT takes less time as compare to Decision tree. • In order to find the best attribute at a node, it will take small subset of the training examples that pass through that node. – Given a stream of examples, use the first ones to choose the root attribute. – Once the root attribute is chosen, the successive examples are passed down to the corresponding leaves, and used to choose the attribute there, and so on recursively.
  • 9. VFDT (cont...) Age<30? Yes No Data Stream Yes _ _ G(Car Type) - G(Gender) > ε Age<30? Yes No Car Type= Sports Car? Car Type= normal Yes No No Data Stream
  • 10. Challenges • Infinite length • Concept-drift • Concept-evolution • Feature Evolution
  • 11. The data stream is divided into equal sized chunks (Input) algorithm Buffer outliers instances. outlier detection module classifier Ensemble M If tp is greater clusters clusters clusters Clusters instances in Buffer cluster is Transformed into a pseudopoint data structure corresponding classifier votes in favor of a another class than the threshold Set of Pseudopoint H Centroid,Weight,radiu s Centroid,Weight,radius Centroid,Weight,radius Centroid,Weight,radius Another instance Calculate q-NSC value Assigned to every instance in Pseudopoint Fig: Work flow for Identifying concept evolution.
  • 13. Applications •Applicable to many domains such as •Intrusion detection system. •Share Market Data. •Security Monitoring. •Network monitoring and traffic engineering. •Business : credit card transaction flows. •Telecommunication calling records. •Web logs and web page click streams.
  • 14. Conclusion • In data stream classification VFDT algorithm is efficient to classified high dimensional data in to the another class. • Then, VFDT shows two key mechanisms of the another class detection technique, outlier detection, and multiple class detection.
  • 15. References [1] Mohammad M. Masud, Qing Chen, Latifur Khan, Charu C. Aggarwal, JingGao, Jiawei Han, “Classification and Adaptive Novel Class Detection of Feature-Evolving Data Streams”, IEEE Tran. on Knowledge And Data Engi., Vol. 25, No. 7, July 2013. [2] Durga Toshniwal, Yogita K,“Clustering Techniques for Streaming Data–A Survey”, 3rd IEEE International Advance Computing Conference (IACC), 2013. [3] S. Hashemi, Y. Yang, Z. Mirzamomen, and M. Kangavari, “Adapted One-versusAll Decision Trees for Data Stream Classi-fication,” IEEE Trans. Knowledge and Data Eng., vol. 21, no. 5, pp. 624-637, May 2012. [4] A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavalda,“New Ensemble Methods for Evolving Data Streams,” Proc. ACMSIGKDD 15th Int’l Conf. Knowledge Discovery and Data Mining,pp. 139-148, 2011.
  • 16. References [5] C.C. Aggarwal, “On Classification and Segmentation of Massive Audio Data Streams,” Knowledge and Information System, vol. 20, pp. 137-156, July 2009. [6] M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints,” IEEE Trans. Knowledge and Data Eng.,vol. 23, no. 6, pp. 859-874, June 2011. [7] M.M. Masud, Q. Chen, L. Khan, C. Aggarwal, J. Gao, J. Han, and B.M. Thuraisingham, “Addressing Concept-Evolution in Concept-Drifting Data Streams,” Proc. IEEE Int’l Conf. Data Mining (ICDM), pp. 929-934, 2010. [8] M.-Y. Yeh, B.-R. Dai, and M.-S. Chen, “Clustering over multiple evolving streams by events and correlations,” IEEE Trans. on Knowl. and Data Eng., vol. 19, no. 10, pp. 1349–1362, Oct. 2009

Editor's Notes

  1. Data streams are Continuous flows of data For example, network traffic, sensor data, and call center records