SlideShare une entreprise Scribd logo
1  sur  15
1
Discretization and Concept
Hierarchy Generation
2
Discretization
 Types of attributes:
 Nominal — values from an unordered set, e.g., color, profession
 Ordinal — values from an ordered set, e.g., military or academic
rank
 Continuous — real numbers, e.g., integer or real numbers
 Discretization:
 Divide the range of a continuous attribute into intervals
 Reduce data size by discretization
3
Discretization and Concept Hierarchy
 Discretization
 Reduce the number of values for a given continuous attribute
by dividing the range of the attribute into intervals
 Interval labels can then be used to replace actual data values
 Supervised vs. unsupervised
 Split (top-down) vs. merge (bottom-up)
 Discretization can be performed recursively on an attribute
4
Concept hierarchy
 Concept hierarchy formation
 Recursively reduce the data by collecting and replacing low level
concepts (such as numeric values for age) by higher level
concepts (such as young, middle-aged, or senior)
 Detail lost
 More meaningful
 Easier to interpret
 Mining becomes easier
 Several concept hierarchies can be defined for the same
attribute
 Manual / Implicit
5
Discretization and Concept Hierarchy
Generation for Numeric Data
 Typical methods:
 Binning
 Histogram analysis
 Clustering analysis
 Entropy-based discretization
 χ2
merging
 Segmentation by natural partitioning
All the methods can be applied recursively
6
Techniques
 Binning
 Distribute values into bins
 Replace by bin mean / median
 Recursive application – leads to concept hierarchies
 Unsupervised technique
 Histogram Analysis
 Data Distribution – Partition
 Equiwidth – (0-100], (100-200], …
 Equidepth
 Recursive
 Minimum Interval size
 Unsupervised
7
Techniques
 Cluster Analysis
 Clusters form nodes of concept hierarchy
 Can decompose / combine
 Lower level / higher level of hierarchy
8
Entropy-Based Discretization
 Given a set of samples S, if S is partitioned into two intervals S1 and S2
using boundary T, the expected information requirement after partitioning is
 Entropy is calculated based on class distribution of the samples in the set.
Given m classes, the entropy of S1 is
where pi is the probability of class i in S1
 The boundary that minimizes the expected information requirement over all
possible boundaries is selected as a binary discretization
 The process is recursively applied to partitions obtained until some stopping
criterion is met
)(
||
||
)(
||
||
),( 2
2
1
1
SEntropy
S
S
SEntropy
S
S
TSI +=
∑=
−=
m
i
ii ppSEntropy
1
21 )(log)(
9
 Reduces data size
 Class information is considered
 Improves accuracy
Entropy-Based Discretization
10
Interval Merging by χ2
Analysis
 ChiMerge
 Bottom-up approach
 find the best neighbouring intervals and merges them to form larger intervals
 Supervised
 If two adjacent intervals have similar distribution of classes – they can be
merged
 Initially each value is in a separate interval
 χ2
tests are performed for adjacent intervals. Those with least
values are merged
 Can be repeated
 Stopping condition (Threshold, Number of intervals)
11
Segmentation by Natural Partitioning
 A simply 3-4-5 rule can be used to segment numeric data into
relatively uniform, “natural” intervals.
 If an interval covers 3, 6, 7 or 9 distinct values at the most
significant digit, partition the range into 3 equi-width intervals
 If it covers 2, 4, or 8 distinct values at the most significant digit,
partition the range into 4 intervals
 If it covers 1, 5, or 10 distinct values at the most significant digit,
partition the range into 5 intervals
12
 Outliers could be present
 Consider only the majority values
 5th
percentile – 95th
percentile
Segmentation by Natural Partitioning
13
Example of 3-4-5 Rule
(-$400 -$5,000)
(-$400 - 0)
(-$400 -
-$300)
(-$300 -
-$200)
(-$200 -
-$100)
(-$100 -
0)
(0 - $1,000)
(0 -
$200)
($200 -
$400)
($400 -
$600)
($600 -
$800) ($800 -
$1,000)
($2,000 - $5, 000)
($2,000 -
$3,000)
($3,000 -
$4,000)
($4,000 -
$5,000)
($1,000 - $2, 000)
($1,000 -
$1,200)
($1,200 -
$1,400)
($1,400 -
$1,600)
($1,600 -
$1,800)
($1,800 -
$2,000)
msd=1,000 Low=-$1,000 High=$2,000Step 2:
Step 4:
Step 1: -$351 -$159 profit $1,838 $4,700
Min Low (i.e, 5%-tile) High(i.e, 95%-tile) Max
count
(-$1,000 - $2,000)
(-$1,000 - 0) (0 -$ 1,000)
Step 3:
($1,000 - $2,000)
14
Concept Hierarchy Generation for
Categorical Data
 Specification of a partial ordering of attributes explicitly at
the schema level by users or experts
 User / Expert defines hierarchy
 Street < city < state < country
 Specification of a portion of a hierarchy by explicit data
grouping
 Manual
 Intermediate level information specified
 Industrial, Agricultural..
15
Concept Hierarchy Generation for
Categorical Data
 Specification of a set of attributes but not their partial
ordering
 Automatically inferring the hierarchy
 Heuristic rule
 High level concepts contain a smaller number of values
 Specification of only a partial set of attributes
 Embedding data semantics
 Attributes with tight semantic connections are pinned together

Contenu connexe

Tendances

1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reductionKrish_ver2
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clusteringKrish_ver2
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalitiesKrish_ver2
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data miningKrish_ver2
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data miningSlideshare
 
CS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitectureCS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitecturePalani Kumar
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data miningMITS Gwalior
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataSalah Amean
 
4.3 multimedia datamining
4.3 multimedia datamining4.3 multimedia datamining
4.3 multimedia dataminingKrish_ver2
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guidethomasmary607
 

Tendances (20)

Data Mining
Data MiningData Mining
Data Mining
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
data mining
data miningdata mining
data mining
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data mining
 
Data reduction
Data reductionData reduction
Data reduction
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Ppt
PptPpt
Ppt
 
Tree pruning
 Tree pruning Tree pruning
Tree pruning
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
CS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitectureCS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_Architecture
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
4.3 multimedia datamining
4.3 multimedia datamining4.3 multimedia datamining
4.3 multimedia datamining
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 

En vedette

Concept description characterization and comparison
Concept description characterization and comparisonConcept description characterization and comparison
Concept description characterization and comparisonric_biet
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reductionmrizwan969
 
Dimensionality reduction
Dimensionality reductionDimensionality reduction
Dimensionality reductionShatakirti Er
 
Different type of databases
Different type of databasesDifferent type of databases
Different type of databasesShwe Yee
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kambererror007
 
How I data mined my text message history
How I data mined my text message historyHow I data mined my text message history
How I data mined my text message historyJoe Cannatti Jr.
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and MiningDaniel JACOB
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationDataminingTools Inc
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsSalah Amean
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDatamining Tools
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsSalah Amean
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 
Data cube computation
Data cube computationData cube computation
Data cube computationRashmi Sheikh
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysisDataminingTools Inc
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 

En vedette (20)

Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Concept description characterization and comparison
Concept description characterization and comparisonConcept description characterization and comparison
Concept description characterization and comparison
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Dimensionality reduction
Dimensionality reductionDimensionality reduction
Dimensionality reduction
 
Different type of databases
Different type of databasesDifferent type of databases
Different type of databases
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
How I data mined my text message history
How I data mined my text message historyHow I data mined my text message history
How I data mined my text message history
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and Mining
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalization
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Data visualization
Data visualizationData visualization
Data visualization
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Data cube computation
Data cube computationData cube computation
Data cube computation
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 

Similaire à 1.8 discretization

Classidication and Tabulation
Classidication and TabulationClassidication and Tabulation
Classidication and TabulationOsama Zahid
 
Data Mining DataLecture Notes for Chapter 2Introduc.docx
Data Mining DataLecture Notes for Chapter 2Introduc.docxData Mining DataLecture Notes for Chapter 2Introduc.docx
Data Mining DataLecture Notes for Chapter 2Introduc.docxwhittemorelucilla
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics
 
1) Chapter#02 Presentation of Data.ppt
1) Chapter#02 Presentation of Data.ppt1) Chapter#02 Presentation of Data.ppt
1) Chapter#02 Presentation of Data.pptMuntazirMehdi43
 
Class1.ppt
Class1.pptClass1.ppt
Class1.pptGautam G
 
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICSSTATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICSnagamani651296
 
Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1RajnishSingh367990
 
ClustIII.ppt
ClustIII.pptClustIII.ppt
ClustIII.pptSueMiu
 

Similaire à 1.8 discretization (20)

Datamining
DataminingDatamining
Datamining
 
Classidication and Tabulation
Classidication and TabulationClassidication and Tabulation
Classidication and Tabulation
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision tree
Decision treeDecision tree
Decision tree
 
Classification
ClassificationClassification
Classification
 
Classification
ClassificationClassification
Classification
 
Data Mining - Exploring Data
Data Mining - Exploring DataData Mining - Exploring Data
Data Mining - Exploring Data
 
Datamining
DataminingDatamining
Datamining
 
Data Mining DataLecture Notes for Chapter 2Introduc.docx
Data Mining DataLecture Notes for Chapter 2Introduc.docxData Mining DataLecture Notes for Chapter 2Introduc.docx
Data Mining DataLecture Notes for Chapter 2Introduc.docx
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
1) Chapter#02 Presentation of Data.ppt
1) Chapter#02 Presentation of Data.ppt1) Chapter#02 Presentation of Data.ppt
1) Chapter#02 Presentation of Data.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICSSTATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
 
Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1
 
ClustIII.ppt
ClustIII.pptClustIII.ppt
ClustIII.ppt
 
Cluster Analysis.pptx
Cluster Analysis.pptxCluster Analysis.pptx
Cluster Analysis.pptx
 
Statistics.ppt
Statistics.pptStatistics.ppt
Statistics.ppt
 

Plus de Krish_ver2

5.5 back tracking
5.5 back tracking5.5 back tracking
5.5 back trackingKrish_ver2
 
5.5 back track
5.5 back track5.5 back track
5.5 back trackKrish_ver2
 
5.5 back tracking 02
5.5 back tracking 025.5 back tracking 02
5.5 back tracking 02Krish_ver2
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructuresKrish_ver2
 
5.4 randamized algorithm
5.4 randamized algorithm5.4 randamized algorithm
5.4 randamized algorithmKrish_ver2
 
5.3 dynamic programming 03
5.3 dynamic programming 035.3 dynamic programming 03
5.3 dynamic programming 03Krish_ver2
 
5.3 dynamic programming
5.3 dynamic programming5.3 dynamic programming
5.3 dynamic programmingKrish_ver2
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-iKrish_ver2
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03Krish_ver2
 
5.2 divide and conquer
5.2 divide and conquer5.2 divide and conquer
5.2 divide and conquerKrish_ver2
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03Krish_ver2
 
5.1 greedyyy 02
5.1 greedyyy 025.1 greedyyy 02
5.1 greedyyy 02Krish_ver2
 
4.4 hashing ext
4.4 hashing  ext4.4 hashing  ext
4.4 hashing extKrish_ver2
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashingKrish_ver2
 

Plus de Krish_ver2 (20)

5.5 back tracking
5.5 back tracking5.5 back tracking
5.5 back tracking
 
5.5 back track
5.5 back track5.5 back track
5.5 back track
 
5.5 back tracking 02
5.5 back tracking 025.5 back tracking 02
5.5 back tracking 02
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
 
5.4 randamized algorithm
5.4 randamized algorithm5.4 randamized algorithm
5.4 randamized algorithm
 
5.3 dynamic programming 03
5.3 dynamic programming 035.3 dynamic programming 03
5.3 dynamic programming 03
 
5.3 dynamic programming
5.3 dynamic programming5.3 dynamic programming
5.3 dynamic programming
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-i
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03
 
5.2 divide and conquer
5.2 divide and conquer5.2 divide and conquer
5.2 divide and conquer
 
5.2 divede and conquer 03
5.2 divede and conquer 035.2 divede and conquer 03
5.2 divede and conquer 03
 
5.1 greedyyy 02
5.1 greedyyy 025.1 greedyyy 02
5.1 greedyyy 02
 
5.1 greedy
5.1 greedy5.1 greedy
5.1 greedy
 
5.1 greedy 03
5.1 greedy 035.1 greedy 03
5.1 greedy 03
 
4.4 hashing02
4.4 hashing024.4 hashing02
4.4 hashing02
 
4.4 hashing
4.4 hashing4.4 hashing
4.4 hashing
 
4.4 hashing ext
4.4 hashing  ext4.4 hashing  ext
4.4 hashing ext
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashing
 
4.2 bst
4.2 bst4.2 bst
4.2 bst
 

Dernier

This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 

Dernier (20)

This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 

1.8 discretization

  • 2. 2 Discretization  Types of attributes:  Nominal — values from an unordered set, e.g., color, profession  Ordinal — values from an ordered set, e.g., military or academic rank  Continuous — real numbers, e.g., integer or real numbers  Discretization:  Divide the range of a continuous attribute into intervals  Reduce data size by discretization
  • 3. 3 Discretization and Concept Hierarchy  Discretization  Reduce the number of values for a given continuous attribute by dividing the range of the attribute into intervals  Interval labels can then be used to replace actual data values  Supervised vs. unsupervised  Split (top-down) vs. merge (bottom-up)  Discretization can be performed recursively on an attribute
  • 4. 4 Concept hierarchy  Concept hierarchy formation  Recursively reduce the data by collecting and replacing low level concepts (such as numeric values for age) by higher level concepts (such as young, middle-aged, or senior)  Detail lost  More meaningful  Easier to interpret  Mining becomes easier  Several concept hierarchies can be defined for the same attribute  Manual / Implicit
  • 5. 5 Discretization and Concept Hierarchy Generation for Numeric Data  Typical methods:  Binning  Histogram analysis  Clustering analysis  Entropy-based discretization  χ2 merging  Segmentation by natural partitioning All the methods can be applied recursively
  • 6. 6 Techniques  Binning  Distribute values into bins  Replace by bin mean / median  Recursive application – leads to concept hierarchies  Unsupervised technique  Histogram Analysis  Data Distribution – Partition  Equiwidth – (0-100], (100-200], …  Equidepth  Recursive  Minimum Interval size  Unsupervised
  • 7. 7 Techniques  Cluster Analysis  Clusters form nodes of concept hierarchy  Can decompose / combine  Lower level / higher level of hierarchy
  • 8. 8 Entropy-Based Discretization  Given a set of samples S, if S is partitioned into two intervals S1 and S2 using boundary T, the expected information requirement after partitioning is  Entropy is calculated based on class distribution of the samples in the set. Given m classes, the entropy of S1 is where pi is the probability of class i in S1  The boundary that minimizes the expected information requirement over all possible boundaries is selected as a binary discretization  The process is recursively applied to partitions obtained until some stopping criterion is met )( || || )( || || ),( 2 2 1 1 SEntropy S S SEntropy S S TSI += ∑= −= m i ii ppSEntropy 1 21 )(log)(
  • 9. 9  Reduces data size  Class information is considered  Improves accuracy Entropy-Based Discretization
  • 10. 10 Interval Merging by χ2 Analysis  ChiMerge  Bottom-up approach  find the best neighbouring intervals and merges them to form larger intervals  Supervised  If two adjacent intervals have similar distribution of classes – they can be merged  Initially each value is in a separate interval  χ2 tests are performed for adjacent intervals. Those with least values are merged  Can be repeated  Stopping condition (Threshold, Number of intervals)
  • 11. 11 Segmentation by Natural Partitioning  A simply 3-4-5 rule can be used to segment numeric data into relatively uniform, “natural” intervals.  If an interval covers 3, 6, 7 or 9 distinct values at the most significant digit, partition the range into 3 equi-width intervals  If it covers 2, 4, or 8 distinct values at the most significant digit, partition the range into 4 intervals  If it covers 1, 5, or 10 distinct values at the most significant digit, partition the range into 5 intervals
  • 12. 12  Outliers could be present  Consider only the majority values  5th percentile – 95th percentile Segmentation by Natural Partitioning
  • 13. 13 Example of 3-4-5 Rule (-$400 -$5,000) (-$400 - 0) (-$400 - -$300) (-$300 - -$200) (-$200 - -$100) (-$100 - 0) (0 - $1,000) (0 - $200) ($200 - $400) ($400 - $600) ($600 - $800) ($800 - $1,000) ($2,000 - $5, 000) ($2,000 - $3,000) ($3,000 - $4,000) ($4,000 - $5,000) ($1,000 - $2, 000) ($1,000 - $1,200) ($1,200 - $1,400) ($1,400 - $1,600) ($1,600 - $1,800) ($1,800 - $2,000) msd=1,000 Low=-$1,000 High=$2,000Step 2: Step 4: Step 1: -$351 -$159 profit $1,838 $4,700 Min Low (i.e, 5%-tile) High(i.e, 95%-tile) Max count (-$1,000 - $2,000) (-$1,000 - 0) (0 -$ 1,000) Step 3: ($1,000 - $2,000)
  • 14. 14 Concept Hierarchy Generation for Categorical Data  Specification of a partial ordering of attributes explicitly at the schema level by users or experts  User / Expert defines hierarchy  Street < city < state < country  Specification of a portion of a hierarchy by explicit data grouping  Manual  Intermediate level information specified  Industrial, Agricultural..
  • 15. 15 Concept Hierarchy Generation for Categorical Data  Specification of a set of attributes but not their partial ordering  Automatically inferring the hierarchy  Heuristic rule  High level concepts contain a smaller number of values  Specification of only a partial set of attributes  Embedding data semantics  Attributes with tight semantic connections are pinned together