SlideShare une entreprise Scribd logo
1  sur  16
Er. Nawaraj Bhandari
Data Warehouse/Data Mining
Classification and Prediction
Chapter 8
Introduction
There are two forms of data analysis that can be used for extracting models
describing important classes or to predict future data trends.
These two forms are as follows:
 Classification
 Prediction
Introduction
 Classification models predict categorical class labels; and prediction models
predict continuous valued functions.
 For example, we can build a classification model to categorize bank loan
applications as either safe or risky
 Prediction model to predict the expenditures in dollars of potential customers
on computer equipment given their income and occupation.
What is classification?
 Following are the examples of cases where the data analysis task is
Classification:
 A bank loan officer wants to analyze the data in order to know which customer
(loan applicant) is risky or which are safe.
 A marketing manager at a company needs to analyze a customer with a given
profile, who will buy a new computer.
 In both of the above examples, a model or classifier is constructed to predict
the categorical labels. These labels are risky or safe for loan application data
and yes or no for marketing data.
What is prediction?
 Following are the examples of cases where the data analysis task is Prediction:
 Suppose the marketing manager needs to predict how much a given customer
will spend during a sale at his company.
 In this example we are bothered to predict a numeric value. Therefore the data
analysis task is an example of numeric prediction. In this case, a model or a
predictor will be constructed that predicts a continuous-valued-function or
ordered value.
 Regression analysis is a statistical methodology that is most often used for
numeric prediction.
How Does Classification Works?
With the help of the bank loan application that we have discussed above, let us
understand the working of classification. The Data Classification process includes
two steps:
 Building the Classifier or Model
 Using Classifier for Classification
Building the Classifier or Model
 This step is the learning step or the learning phase.
 In this step the classification algorithms build the classifier.
 The classifier is built from the training set made up of database tuples and their
associated class labels.
 Each tuple that constitutes the training set is referred to as a category or class.
These tuples can also be referred to as sample, object or data points.
Building the Classifier or Model
Using Classifier for Classification
 In this step, the classifier is used for classification.
 Here the test data is used to estimate the accuracy of classification rules.
 The classification rules can be applied to the new data tuples if the accuracy is
considered acceptable.
Classification by Decision Tree Induction
Decision tree induction is the learning of decision trees from class labeled
training tuples.
Decision tree is a flowchart-like tree structure where internal nodes (non leaf
node) denotes a test on an attribute branches represent outcomes of tests Leaf
nodes (terminal nodes) hold class labels and Root node is the topmost node.
Classification by Decision Tree Induction
Classification by Decision Tree Induction
Example
RID age income student credit-rating Class
1 youth high no fair ?
Test on age: youth
Test of student: no
Reach leaf node
Class NO: the customer Is Unlikely to buy a computer
Algorithm for constructing Decision Tress
Constructing a Decision tree uses greedy algorithm. Tree is constructed in a top-down recursive divide-
and-conquer manner.
• At start, all the training tuples are at the root
• Tuples are partitioned recursively based on selected attributes
• If all samples for a given node belong to the same class
Label the class
• If There are no remaining attributes for further partitioning
Majority voting is employed for classifying the leaf
• There are no samples left
Label the class and terminate
• Else
Got to step 2
K-Mean Algorithms
1. Take mean value
2. Find nearest number of mean and put in cluster.
3. Repeat one and two until we get same mean
References
1. Sam Anahory, Dennis Murray, “Data warehousing In the Real World”, Pearson
Education.
2. Kimball, R. “The Data Warehouse Toolkit”, Wiley, 1996.
3. Teorey, T. J., “Database Modeling and Design: The Entity-Relationship Approach”,
Morgan Kaufmann Publishers, Inc., 1990.
4. “An Overview of Data Warehousing and OLAP Technology”, S. Chaudhuri,
Microsoft Research
5. “Data Warehousing with Oracle”, M. A. Shahzad
6. “Data Mining Concepts and Techniques”, Morgan Kaufmann J. Han, M Kamber
Second Edition ISBN : 978-1-55860-901-3
ANY QUESTIONS?

Contenu connexe

Tendances

Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
sumit621
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
sumit621
 
Chapter 13 data warehousing
Chapter 13   data warehousingChapter 13   data warehousing
Chapter 13 data warehousing
sumit621
 

Tendances (19)

Data Mining
Data MiningData Mining
Data Mining
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Data mining
Data miningData mining
Data mining
 
Lecture1
Lecture1Lecture1
Lecture1
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyush
 
142230 633685297550892500
142230 633685297550892500142230 633685297550892500
142230 633685297550892500
 
Database
DatabaseDatabase
Database
 
Chapter 13 data warehousing
Chapter 13   data warehousingChapter 13   data warehousing
Chapter 13 data warehousing
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
Part1
Part1Part1
Part1
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
4 Data preparation and processing
4  Data preparation and processing4  Data preparation and processing
4 Data preparation and processing
 
Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processing
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Data mininng trends
Data mininng trendsData mininng trends
Data mininng trends
 

Similaire à Research trends in data warehousing and data mining

Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
iaeronlineexm
 
Module three ppt of DWDM. Details of data mining rules
Module three ppt of DWDM. Details of data mining rulesModule three ppt of DWDM. Details of data mining rules
Module three ppt of DWDM. Details of data mining rules
NivaTripathy1
 

Similaire à Research trends in data warehousing and data mining (20)

Classification
ClassificationClassification
Classification
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
 
3 classification
3  classification3  classification
3 classification
 
Classification and Prediction.pptx
Classification and Prediction.pptxClassification and Prediction.pptx
Classification and Prediction.pptx
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptx
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptx
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
 
Machine Learning - Deep Learning
Machine Learning - Deep LearningMachine Learning - Deep Learning
Machine Learning - Deep Learning
 
classification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdfclassification in data mining and data warehousing.pdf
classification in data mining and data warehousing.pdf
 
Big Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptxBig Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptx
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Module three ppt of DWDM. Details of data mining rules
Module three ppt of DWDM. Details of data mining rulesModule three ppt of DWDM. Details of data mining rules
Module three ppt of DWDM. Details of data mining rules
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applications
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & Prediction
 
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics Domain
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 

Plus de Er. Nawaraj Bhandari

Plus de Er. Nawaraj Bhandari (20)

Data mining approaches and methods
Data mining approaches and methodsData mining approaches and methods
Data mining approaches and methods
 
Mining Association Rules in Large Database
Mining Association Rules in Large DatabaseMining Association Rules in Large Database
Mining Association Rules in Large Database
 
Data warehouse testing
Data warehouse testingData warehouse testing
Data warehouse testing
 
Data warehouse physical design
Data warehouse physical designData warehouse physical design
Data warehouse physical design
 
Data warehouse logical design
Data warehouse logical designData warehouse logical design
Data warehouse logical design
 
Chapter 3: Simplification of Boolean Function
Chapter 3: Simplification of Boolean FunctionChapter 3: Simplification of Boolean Function
Chapter 3: Simplification of Boolean Function
 
Chapter 6: Sequential Logic
Chapter 6: Sequential LogicChapter 6: Sequential Logic
Chapter 6: Sequential Logic
 
Chapter 5: Cominational Logic with MSI and LSI
Chapter 5: Cominational Logic with MSI and LSIChapter 5: Cominational Logic with MSI and LSI
Chapter 5: Cominational Logic with MSI and LSI
 
Chapter 4: Combinational Logic
Chapter 4: Combinational LogicChapter 4: Combinational Logic
Chapter 4: Combinational Logic
 
Chapter 2: Boolean Algebra and Logic Gates
Chapter 2: Boolean Algebra and Logic GatesChapter 2: Boolean Algebra and Logic Gates
Chapter 2: Boolean Algebra and Logic Gates
 
Chapter 1: Binary System
 Chapter 1: Binary System Chapter 1: Binary System
Chapter 1: Binary System
 
Introduction to Electronic Commerce
Introduction to Electronic CommerceIntroduction to Electronic Commerce
Introduction to Electronic Commerce
 
Evaluating software development
Evaluating software developmentEvaluating software development
Evaluating software development
 
Using macros in microsoft excel part 2
Using macros in microsoft excel   part 2Using macros in microsoft excel   part 2
Using macros in microsoft excel part 2
 
Using macros in microsoft excel part 1
Using macros in microsoft excel   part 1Using macros in microsoft excel   part 1
Using macros in microsoft excel part 1
 
Using macros in microsoft access
Using macros in microsoft accessUsing macros in microsoft access
Using macros in microsoft access
 
Testing software development
Testing software developmentTesting software development
Testing software development
 
Application software and business processes
Application software and business processesApplication software and business processes
Application software and business processes
 
An introduction to vba and macros
An introduction to vba and macrosAn introduction to vba and macros
An introduction to vba and macros
 
An introduction to end user software development
An introduction to end user software developmentAn introduction to end user software development
An introduction to end user software development
 

Dernier

Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 

Dernier (20)

Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 

Research trends in data warehousing and data mining

  • 1. Er. Nawaraj Bhandari Data Warehouse/Data Mining Classification and Prediction Chapter 8
  • 2. Introduction There are two forms of data analysis that can be used for extracting models describing important classes or to predict future data trends. These two forms are as follows:  Classification  Prediction
  • 3. Introduction  Classification models predict categorical class labels; and prediction models predict continuous valued functions.  For example, we can build a classification model to categorize bank loan applications as either safe or risky  Prediction model to predict the expenditures in dollars of potential customers on computer equipment given their income and occupation.
  • 4. What is classification?  Following are the examples of cases where the data analysis task is Classification:  A bank loan officer wants to analyze the data in order to know which customer (loan applicant) is risky or which are safe.  A marketing manager at a company needs to analyze a customer with a given profile, who will buy a new computer.  In both of the above examples, a model or classifier is constructed to predict the categorical labels. These labels are risky or safe for loan application data and yes or no for marketing data.
  • 5. What is prediction?  Following are the examples of cases where the data analysis task is Prediction:  Suppose the marketing manager needs to predict how much a given customer will spend during a sale at his company.  In this example we are bothered to predict a numeric value. Therefore the data analysis task is an example of numeric prediction. In this case, a model or a predictor will be constructed that predicts a continuous-valued-function or ordered value.  Regression analysis is a statistical methodology that is most often used for numeric prediction.
  • 6. How Does Classification Works? With the help of the bank loan application that we have discussed above, let us understand the working of classification. The Data Classification process includes two steps:  Building the Classifier or Model  Using Classifier for Classification
  • 7. Building the Classifier or Model  This step is the learning step or the learning phase.  In this step the classification algorithms build the classifier.  The classifier is built from the training set made up of database tuples and their associated class labels.  Each tuple that constitutes the training set is referred to as a category or class. These tuples can also be referred to as sample, object or data points.
  • 9. Using Classifier for Classification  In this step, the classifier is used for classification.  Here the test data is used to estimate the accuracy of classification rules.  The classification rules can be applied to the new data tuples if the accuracy is considered acceptable.
  • 10. Classification by Decision Tree Induction Decision tree induction is the learning of decision trees from class labeled training tuples. Decision tree is a flowchart-like tree structure where internal nodes (non leaf node) denotes a test on an attribute branches represent outcomes of tests Leaf nodes (terminal nodes) hold class labels and Root node is the topmost node.
  • 11. Classification by Decision Tree Induction
  • 12. Classification by Decision Tree Induction Example RID age income student credit-rating Class 1 youth high no fair ? Test on age: youth Test of student: no Reach leaf node Class NO: the customer Is Unlikely to buy a computer
  • 13. Algorithm for constructing Decision Tress Constructing a Decision tree uses greedy algorithm. Tree is constructed in a top-down recursive divide- and-conquer manner. • At start, all the training tuples are at the root • Tuples are partitioned recursively based on selected attributes • If all samples for a given node belong to the same class Label the class • If There are no remaining attributes for further partitioning Majority voting is employed for classifying the leaf • There are no samples left Label the class and terminate • Else Got to step 2
  • 14. K-Mean Algorithms 1. Take mean value 2. Find nearest number of mean and put in cluster. 3. Repeat one and two until we get same mean
  • 15. References 1. Sam Anahory, Dennis Murray, “Data warehousing In the Real World”, Pearson Education. 2. Kimball, R. “The Data Warehouse Toolkit”, Wiley, 1996. 3. Teorey, T. J., “Database Modeling and Design: The Entity-Relationship Approach”, Morgan Kaufmann Publishers, Inc., 1990. 4. “An Overview of Data Warehousing and OLAP Technology”, S. Chaudhuri, Microsoft Research 5. “Data Warehousing with Oracle”, M. A. Shahzad 6. “Data Mining Concepts and Techniques”, Morgan Kaufmann J. Han, M Kamber Second Edition ISBN : 978-1-55860-901-3

Notes de l'éditeur

  1. Example is inside hardcopy please follow hardcopy
  2. Example is inside hardcopy please follow hardcopy