SlideShare une entreprise Scribd logo
1  sur  56
DATA MINING
TECHNIQUES
UNIT-III
Association Rule Mining
• All Electronics-customer buys PC & Digital Camera
What should you recommend to him next?
Frequent patterns and association rules are the knowledge that you want to
mine
• Frequent patterns: patterns that appear frequently in a data set
• Frequent item sets: such as milk and bread, that appear frequently in a
transaction data set is frequent item set.
• Frequent sub sequence: appear in subsequence together in transaction data
set
• Frequent substructure: sub graphs, sub trees or sub lattices which may be
combined with item sets or subsequence ,if it occurs frequently is called a
frequent structured pattern
Basic Concepts
• Mining frequent patterns plays an essential role in mining associations,
correlations, data classifications, clustering etc.,
• Market Basket Analysis:
customer1:milk,bread,cereal
customer2:milk,bread,sugar,eggs
customer3:milk,bread,butter
customer4:sugar,eggs
• Which groups or sets of items are customers likely to purchase on a
given trip to a store?
Association Rules
• Support and Confidence are two measures of rule interestingness.
Support: (usefulness of discovered rules)
Certainity:(certainity of discovered rules)
[ support=2%,confidence=60%]
2% of all the transactions under analysis show that computer and
antivirus are purchased together- support
60% of the customers who purchased a computer also bought the
software- confidence
Association Rules
• Association rules are interesting if they satisfy both a minimum
support threshold and a minimum confidence threshold
• Frequent itemset, closed item sets and association rules:
I={I1,I2,..In}-Itemset
D-Task relevant data-database
T-Transaction
Rule: A=>B
Support(A=>B)=P(AUB)-Relative support
Confidence(A=>B)=P(B/A)
Association Rules
• Item sets
• K-Item sets
• Occurrence frequency of an itemset
• Minimum support threshold: If the relative support of an itemset I satisfies a
prespecified minimum support threshold then I is a frequent itemset.
• Confidence(A=>B)=P(B/A)
=support(AUB)
support(A)
=support_count(AUB)
support_count(A)
• Thus the problem of mining association rules can be reduced to that of mining
frequency item sets.
Frequent Item set in Data set (Association Rule
Mining)
• Association Mining searches for frequent items in the data-set. In frequent
mining usually the interesting associations and correlations between item
sets in transactional and relational databases are found. In short, Frequent
Mining shows which items appear together in a transaction or relation.
• Need of Association Mining:
Frequent mining is generation of association rules from a Transactional
Dataset. If there are 2 items X and Y purchased frequently then its good to
put them together in stores or provide some discount offer on one item on
purchase of other item. This can really increase the sales. For example it is
likely to find that if a customer buys Milk and bread he/she also
buys Butter.
So the association rule is [‘milk]^[‘bread’]=>[‘butter’]. So seller can
suggest the customer to buy butter if he/she buys Milk and Bread.
Important Definitions :
• Support : It is one of the measure of interestingness. This tells about
usefulness and certainty of rules. 5% Support means total 5% of
transactions in database follow the rule.
• Support(A -> B) = Support_count(A ∪ B)
• Confidence: A confidence of 60% means that 60% of the customers
who purchased a milk and bread also bought butter.
• Confidence(A -> B) = Support_count(A ∪ B) / Support_count(A)
• If a rule satisfies both minimum support and minimum confidence, it
is a strong rule.
Important Definitions :
• Support_count(X) : Number of transactions in which X appears. If X
is A union B then it is the number of transactions in which A and B
both are present.
1.Maximal Itemset: An itemset is maximal frequent if none of its
supersets are frequent.
2.Closed Itemset: An itemset is closed if none of its immediate
supersets have same support count same as Itemset.
3.K- Itemset: Itemset which contains K items is a K-itemset. So it can
be said that an itemset is frequent if the corresponding support count is
greater than minimum support count.
Example On finding Frequent Itemsets
• Consider the given dataset with given transactions.
• Lets say minimum support count is 3
• Relation hold is maximal frequent => closed => frequent
• 1-frequent:
• {A} = 3; // not closed due to {A, C} and not maximal
• {B} = 4; // not closed due to {B, D} and no maximal
• {C} = 4; // not closed due to {C, D} not maximal
• {D} = 5; // closed item-set since not immediate super-set has same count. Not maximal
• 2-frequent:
• {A, B} = 2 // not frequent because support count < minimum support count so ignore
• {A, C} = 3 // not closed due to {A, C, D}
• {A, D} = 3 // not closed due to {A, C, D}
• {B, C} = 3 // not closed due to {B, C, D}
• {B, D} = 4 // closed but not maximal due to {B, C, D}
• {C, D} = 4 // closed but not maximal due to {B, C, D}
• 3-frequent:
• {A, B, C} = 2 // ignore not frequent because support count < minimum support count
• {A, B, D} = 2 // ignore not frequent because support count < minimum support count
• {A, C, D} = 3 // maximal frequent
• {B, C, D} = 3 // maximal frequent
• 4-frequent:
• {A, B, C, D} = 2 //ignore not frequent
AR as Two step Process
• Find all frequent item sets
• Generate strong association rules from the frequent item sets
• Challenge in mining frequent item sets:
• Closed frequent item set: An itemset X is closed in a data set D if there
exists no proper super-itemset Y such that Y has the same support
count as X in D
• Maximal Frequent item set: An itemset X is a maximal frequent
itemset in a data set D if X is frequent & there exists no super-itemset
Y such that X ʗ Y& Y is frequent in D
Example: closed and maximal frequent
item sets
• A transaction database has only two transactions:
{<a1,a2,..a100>;<a1,a2,..a50>} Min_sup=1
• We find two closed frequent item sets and their support counts
C={{a1,a2,..a100}:1;{a1,a2,..a50}:2}
• Only one maximal frequent itemset:
M={{a1,a2,…a100}:1}
• We cannot include {a1,a2,..a50} as a maximal frequent itemset
because it has a frequent superset,{a1,a2,..a100}
• C-closed frequent item set, M-Maximal frequent item sets
Example: closed and maximal frequent
item sets
• Set of closed frequent item sets contain complete information
regarding the frequent item sets
• From c, we can derive
(i){a2,a45:2} since {a2,a45} is a sub-itemset of the itemset
{a1,a2,..a50:2}
(ii){a8,a55:1} since {a8,a55} is not a sub-itemset of the previous
itemset but of the itemset {a1,a2,..a100:1}
Frequent Itemset Mining Methods: Apriori
and FP Growth
• Apriori algorithm:
Finding frequent item sets by confined candidate generation
A seminal algorithm proposed by R.Agarwal & R.Srikant in 1994 for
mining frequent item sets.
Name of the algorithm is due to the fact that algorithm uses prior
knowledge of frequent itemset properties
Apriori Property: All non empty subsets of a frequent itemset must
also be frequent
Join Step and Prune Step
Example: problem
Problem contd.,
Generating Association Rules from
frequent item sets
• Once the frequent item sets from transactions have been found, it is
straightforward to generate strong association rules from them
• Strong association rules satisfy both minimum support and minimum
confidence
• Confidence(A=>B)=P(B/A)
=support_count(AUB)
support_count(A)
Generating Association Rules from
frequent item sets
• Association rules are generated as follows:
For each frequent itemset L, generate all non-empty subsets of L
For every non-empty subset s of L, output the rule
“s=>l-s” if sup_count(l)
sup_count(s) >= min_conf
Example: problem
Improving the efficiency of apriori
• Hash – based Technique: a hash based technique can be used to
reduce the size of the candidate k-item sets, cK ;k >1
• Example :
Improving the efficiency of apriori
• Transaction Reduction: reducing the no. of transaction scanned in
future iterations.
• A transaction that does not contain any frequent k-item sets cannot
contain any frequent (k+1) item sets.
• Such a transaction can be marked or removed from further
consideration.
Improving the efficiency of apriori
• Partitioning:2db scans
Partitioning the data to find candidate itemsets requires 2 db scans to
mine the frequent itemsets
• Phase I:
Divide the transaction of D into ‘n’ non overlapping partitions
Find the local frequent itemsets for each partition
Any itemset that is frequent in D must occur as a frequentitemset in
atleast one of the partitions
Therefore all local frequent itemsets are candidate itemsets in D
Improving the efficiency of apriori
• Phase: II
A second scan of D is conducted to determine the global frequent
item set, D is scanned only once in each phase
• Sampling
• Dynamic itemset counting
A database has five transactions. Let min sup D
60% and min conf D 80%.
A pattern-growth approach for mining
frequent item sets
• Apriori algorithm: Disadvantages
• Generate and test method-reduces the size of candidate sets that leads
to good performance gain
• Suffers from nontrivial costs
Frequent pattern growth or FP growth
(Divide and Conquer)
• Mines the complete set of frequent item sets without such a costly
candidate generation
• First it compresses the database representing frequent items into FP-
tree,which retains the itemset association information
• Create the root of the tree labelled with “null”
• Scan D second time
• Items in each transaction are processed in ”L” order and branch is
created for each transactions
Mining the FP-tree
• Start from each frequent length_1 pattern (as an initial suffix pattern)
construct its conditional pattern base
• Then constructs its conditional FP tree and perform mining recursively
on the tree
• Pattern growth is achieved by the concatenation of suffix pattern with
the frequent patterns generated from a conditional FP-tree
• This method reduces the search cost.
• Algorithm-FP growth
Mining frequent item sets using the
vertical data format
Mining closed and maximum patterns
• How can we mine closed frequent item sets?
• Strategies included:
Item merging
Sub-itemset pruning
Item skipping
• When a new frequent itemset is derived it is necessary to perform two
kinds of closure checking:
Superset checking
Subset checking
Pattern Evaluation Methods
• Strong rules are not necessarily interesting:
Pattern Evaluation Methods
• From association analysis to correlation analysis:
• Correlation rule:
• Correlation measure:
Pattern Evaluation Methods: chi-square
measure
Comparison of pattern evaluation
measures
• All-confidence
• Max_confidence
• Kulczynski(kulc)
• Cosine
• Null Transactions
• Null Invariant
Advanced pattern mining
• What is pattern mining?
• Pattern mining: A Road map
Basic patterns: frequent pattern, closed pattern, max-pattern,
infrequent pattern or rare patterns, negative patterns
Based on the abstraction levels involved in a pattern: single-level
association rule, multilevel association rules
Pattern mining: A Road map
Based on the number of dimensions involved in the rule or pattern :
Single-dimensional association rule/pattern , Multidimensional
association rule/pattern
Pattern mining: A Road map
• Based on the types of values handled in the rule or pattern: Boolean
association rule, quantitative association rule
Pattern mining: A Road map
• Based on the constraints or criteria used to mine selective
patterns:constraint-based,approximate,compressed,near-match,top-
k,redundancy-aware top-k
• Based on kinds of data and features to be mined: sequential patterns,
structural patterns
• Based on application domain-specific semantics
• Based on data analysis usages: pattern based classification, pattern
based clustering
Pattern mining in multilevel,
multidimensional space
• Mining multilevel associations
Pattern mining in multilevel,
multidimensional space
• Using uniform minimum support for all levels
• Using reduced minimum support at lower levels
Pattern mining in multilevel,
multidimensional space
• Using item or group-based minimum support
Pattern mining in multilevel,
multidimensional space
• Mining Multidimensional Associations
Single dimensional or intradimensional association rules
Multi dimensional or interdimensional association rules
Pattern mining in multilevel,
multidimensional space
• Mining quantitative association rules
A data cube method
A clustering-based method
A statistical analysis method to uncover exceptional behaviours
Pattern mining in multilevel,
multidimensional space
• Mining rare patterns and negative patterns
Constraint-based frequent pattern mining
• It includes the following: Knowledge type constraints, data
constraints, dimension/level constraints, Interestingness constraints,
Rule constraints
• Meta-rule guided mining of association rule
• Constraint based pattern generation
• An efficient frequent pattern mining processor can prune its search
space during mining in two ways:
Pruning pattern search space
Pruning data search space
Constraint-based frequent pattern mining
• There are five categories of pattern mining constraints:
Antimonotonic
Monotonic
Succint
Convertible
In convertible
Constraint-based frequent pattern mining
• Pruning data space with data pruning constraints
Data succinctness
Data antimonotocity
Data mining techniques unit III

Contenu connexe

Tendances

Data cube computation
Data cube computationData cube computation
Data cube computation
Rashmi Sheikh
 

Tendances (20)

Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
Association rule mining
Association rule miningAssociation rule mining
Association rule mining
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
 
Decision tree
Decision treeDecision tree
Decision tree
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
Random forest
Random forestRandom forest
Random forest
 
Chapter 4 Classification
Chapter 4 ClassificationChapter 4 Classification
Chapter 4 Classification
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Data cube computation
Data cube computationData cube computation
Data cube computation
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
 

Similaire à Data mining techniques unit III

Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule Mining
Wan Aezwani Wab
 

Similaire à Data mining techniques unit III (20)

Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptx
 
Apriori Algorithm.pptx
Apriori Algorithm.pptxApriori Algorithm.pptx
Apriori Algorithm.pptx
 
Mining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesMining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association Rules
 
6 module 4
6 module 46 module 4
6 module 4
 
Association rules apriori algorithm
Association rules   apriori algorithmAssociation rules   apriori algorithm
Association rules apriori algorithm
 
Dma unit 2
Dma unit  2Dma unit  2
Dma unit 2
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14
 
apriori.pptx
apriori.pptxapriori.pptx
apriori.pptx
 
big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptx
 
APRIORI ALGORITHM -PPT.pptx
APRIORI ALGORITHM -PPT.pptxAPRIORI ALGORITHM -PPT.pptx
APRIORI ALGORITHM -PPT.pptx
 
Lec6_Association.ppt
Lec6_Association.pptLec6_Association.ppt
Lec6_Association.ppt
 
Association rules by arpit_sharma
Association rules by arpit_sharmaAssociation rules by arpit_sharma
Association rules by arpit_sharma
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
 
Mining Frequent Itemsets.ppt
Mining Frequent Itemsets.pptMining Frequent Itemsets.ppt
Mining Frequent Itemsets.ppt
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule Mining
 
06 fp basic
06 fp basic06 fp basic
06 fp basic
 
MODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptxMODULE 5 _ Mining frequent patterns and associations.pptx
MODULE 5 _ Mining frequent patterns and associations.pptx
 

Plus de malathieswaran29

Plus de malathieswaran29 (14)

Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
 
Data mining techniques unit 2
Data mining techniques unit 2Data mining techniques unit 2
Data mining techniques unit 2
 
Data mining techniques unit v
Data mining techniques unit vData mining techniques unit v
Data mining techniques unit v
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
Bitcoin data mining
Bitcoin data miningBitcoin data mining
Bitcoin data mining
 
Principles of management organizing & reengineering
Principles of management organizing & reengineeringPrinciples of management organizing & reengineering
Principles of management organizing & reengineering
 
Principles of management human factor & motivation
Principles of management human factor & motivationPrinciples of management human factor & motivation
Principles of management human factor & motivation
 
Principles given by fayol
Principles given by fayolPrinciples given by fayol
Principles given by fayol
 
Software maintenance real world maintenance cost
Software maintenance real world maintenance costSoftware maintenance real world maintenance cost
Software maintenance real world maintenance cost
 
SOFTWARE MAINTENANCE -4
SOFTWARE MAINTENANCE -4SOFTWARE MAINTENANCE -4
SOFTWARE MAINTENANCE -4
 
SOFTWARE MAINTENANCE -3
SOFTWARE MAINTENANCE -3SOFTWARE MAINTENANCE -3
SOFTWARE MAINTENANCE -3
 
SOFTWARE MAINTENANCE -2
SOFTWARE MAINTENANCE -2SOFTWARE MAINTENANCE -2
SOFTWARE MAINTENANCE -2
 
SOFTWARE MAINTENANCE -1
SOFTWARE MAINTENANCE -1SOFTWARE MAINTENANCE -1
SOFTWARE MAINTENANCE -1
 
SOFTWARE MAINTENANCE- 5
SOFTWARE MAINTENANCE- 5SOFTWARE MAINTENANCE- 5
SOFTWARE MAINTENANCE- 5
 

Dernier

DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 

Dernier (20)

DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 

Data mining techniques unit III

  • 2. Association Rule Mining • All Electronics-customer buys PC & Digital Camera What should you recommend to him next? Frequent patterns and association rules are the knowledge that you want to mine • Frequent patterns: patterns that appear frequently in a data set • Frequent item sets: such as milk and bread, that appear frequently in a transaction data set is frequent item set. • Frequent sub sequence: appear in subsequence together in transaction data set • Frequent substructure: sub graphs, sub trees or sub lattices which may be combined with item sets or subsequence ,if it occurs frequently is called a frequent structured pattern
  • 3. Basic Concepts • Mining frequent patterns plays an essential role in mining associations, correlations, data classifications, clustering etc., • Market Basket Analysis: customer1:milk,bread,cereal customer2:milk,bread,sugar,eggs customer3:milk,bread,butter customer4:sugar,eggs • Which groups or sets of items are customers likely to purchase on a given trip to a store?
  • 4. Association Rules • Support and Confidence are two measures of rule interestingness. Support: (usefulness of discovered rules) Certainity:(certainity of discovered rules) [ support=2%,confidence=60%] 2% of all the transactions under analysis show that computer and antivirus are purchased together- support 60% of the customers who purchased a computer also bought the software- confidence
  • 5. Association Rules • Association rules are interesting if they satisfy both a minimum support threshold and a minimum confidence threshold • Frequent itemset, closed item sets and association rules: I={I1,I2,..In}-Itemset D-Task relevant data-database T-Transaction Rule: A=>B Support(A=>B)=P(AUB)-Relative support Confidence(A=>B)=P(B/A)
  • 6. Association Rules • Item sets • K-Item sets • Occurrence frequency of an itemset • Minimum support threshold: If the relative support of an itemset I satisfies a prespecified minimum support threshold then I is a frequent itemset. • Confidence(A=>B)=P(B/A) =support(AUB) support(A) =support_count(AUB) support_count(A) • Thus the problem of mining association rules can be reduced to that of mining frequency item sets.
  • 7. Frequent Item set in Data set (Association Rule Mining) • Association Mining searches for frequent items in the data-set. In frequent mining usually the interesting associations and correlations between item sets in transactional and relational databases are found. In short, Frequent Mining shows which items appear together in a transaction or relation. • Need of Association Mining: Frequent mining is generation of association rules from a Transactional Dataset. If there are 2 items X and Y purchased frequently then its good to put them together in stores or provide some discount offer on one item on purchase of other item. This can really increase the sales. For example it is likely to find that if a customer buys Milk and bread he/she also buys Butter. So the association rule is [‘milk]^[‘bread’]=>[‘butter’]. So seller can suggest the customer to buy butter if he/she buys Milk and Bread.
  • 8. Important Definitions : • Support : It is one of the measure of interestingness. This tells about usefulness and certainty of rules. 5% Support means total 5% of transactions in database follow the rule. • Support(A -> B) = Support_count(A ∪ B) • Confidence: A confidence of 60% means that 60% of the customers who purchased a milk and bread also bought butter. • Confidence(A -> B) = Support_count(A ∪ B) / Support_count(A) • If a rule satisfies both minimum support and minimum confidence, it is a strong rule.
  • 9. Important Definitions : • Support_count(X) : Number of transactions in which X appears. If X is A union B then it is the number of transactions in which A and B both are present. 1.Maximal Itemset: An itemset is maximal frequent if none of its supersets are frequent. 2.Closed Itemset: An itemset is closed if none of its immediate supersets have same support count same as Itemset. 3.K- Itemset: Itemset which contains K items is a K-itemset. So it can be said that an itemset is frequent if the corresponding support count is greater than minimum support count.
  • 10. Example On finding Frequent Itemsets • Consider the given dataset with given transactions. • Lets say minimum support count is 3 • Relation hold is maximal frequent => closed => frequent
  • 11. • 1-frequent: • {A} = 3; // not closed due to {A, C} and not maximal • {B} = 4; // not closed due to {B, D} and no maximal • {C} = 4; // not closed due to {C, D} not maximal • {D} = 5; // closed item-set since not immediate super-set has same count. Not maximal • 2-frequent: • {A, B} = 2 // not frequent because support count < minimum support count so ignore • {A, C} = 3 // not closed due to {A, C, D} • {A, D} = 3 // not closed due to {A, C, D} • {B, C} = 3 // not closed due to {B, C, D} • {B, D} = 4 // closed but not maximal due to {B, C, D} • {C, D} = 4 // closed but not maximal due to {B, C, D} • 3-frequent: • {A, B, C} = 2 // ignore not frequent because support count < minimum support count • {A, B, D} = 2 // ignore not frequent because support count < minimum support count • {A, C, D} = 3 // maximal frequent • {B, C, D} = 3 // maximal frequent • 4-frequent: • {A, B, C, D} = 2 //ignore not frequent
  • 12. AR as Two step Process • Find all frequent item sets • Generate strong association rules from the frequent item sets • Challenge in mining frequent item sets: • Closed frequent item set: An itemset X is closed in a data set D if there exists no proper super-itemset Y such that Y has the same support count as X in D • Maximal Frequent item set: An itemset X is a maximal frequent itemset in a data set D if X is frequent & there exists no super-itemset Y such that X ʗ Y& Y is frequent in D
  • 13. Example: closed and maximal frequent item sets • A transaction database has only two transactions: {<a1,a2,..a100>;<a1,a2,..a50>} Min_sup=1 • We find two closed frequent item sets and their support counts C={{a1,a2,..a100}:1;{a1,a2,..a50}:2} • Only one maximal frequent itemset: M={{a1,a2,…a100}:1} • We cannot include {a1,a2,..a50} as a maximal frequent itemset because it has a frequent superset,{a1,a2,..a100} • C-closed frequent item set, M-Maximal frequent item sets
  • 14. Example: closed and maximal frequent item sets • Set of closed frequent item sets contain complete information regarding the frequent item sets • From c, we can derive (i){a2,a45:2} since {a2,a45} is a sub-itemset of the itemset {a1,a2,..a50:2} (ii){a8,a55:1} since {a8,a55} is not a sub-itemset of the previous itemset but of the itemset {a1,a2,..a100:1}
  • 15. Frequent Itemset Mining Methods: Apriori and FP Growth • Apriori algorithm: Finding frequent item sets by confined candidate generation A seminal algorithm proposed by R.Agarwal & R.Srikant in 1994 for mining frequent item sets. Name of the algorithm is due to the fact that algorithm uses prior knowledge of frequent itemset properties Apriori Property: All non empty subsets of a frequent itemset must also be frequent Join Step and Prune Step
  • 18.
  • 19. Generating Association Rules from frequent item sets • Once the frequent item sets from transactions have been found, it is straightforward to generate strong association rules from them • Strong association rules satisfy both minimum support and minimum confidence • Confidence(A=>B)=P(B/A) =support_count(AUB) support_count(A)
  • 20. Generating Association Rules from frequent item sets • Association rules are generated as follows: For each frequent itemset L, generate all non-empty subsets of L For every non-empty subset s of L, output the rule “s=>l-s” if sup_count(l) sup_count(s) >= min_conf
  • 22. Improving the efficiency of apriori • Hash – based Technique: a hash based technique can be used to reduce the size of the candidate k-item sets, cK ;k >1 • Example :
  • 23. Improving the efficiency of apriori • Transaction Reduction: reducing the no. of transaction scanned in future iterations. • A transaction that does not contain any frequent k-item sets cannot contain any frequent (k+1) item sets. • Such a transaction can be marked or removed from further consideration.
  • 24. Improving the efficiency of apriori • Partitioning:2db scans Partitioning the data to find candidate itemsets requires 2 db scans to mine the frequent itemsets • Phase I: Divide the transaction of D into ‘n’ non overlapping partitions Find the local frequent itemsets for each partition Any itemset that is frequent in D must occur as a frequentitemset in atleast one of the partitions Therefore all local frequent itemsets are candidate itemsets in D
  • 25. Improving the efficiency of apriori • Phase: II A second scan of D is conducted to determine the global frequent item set, D is scanned only once in each phase • Sampling • Dynamic itemset counting
  • 26. A database has five transactions. Let min sup D 60% and min conf D 80%.
  • 27.
  • 28. A pattern-growth approach for mining frequent item sets • Apriori algorithm: Disadvantages • Generate and test method-reduces the size of candidate sets that leads to good performance gain • Suffers from nontrivial costs
  • 29. Frequent pattern growth or FP growth (Divide and Conquer) • Mines the complete set of frequent item sets without such a costly candidate generation • First it compresses the database representing frequent items into FP- tree,which retains the itemset association information • Create the root of the tree labelled with “null” • Scan D second time • Items in each transaction are processed in ”L” order and branch is created for each transactions
  • 30. Mining the FP-tree • Start from each frequent length_1 pattern (as an initial suffix pattern) construct its conditional pattern base • Then constructs its conditional FP tree and perform mining recursively on the tree • Pattern growth is achieved by the concatenation of suffix pattern with the frequent patterns generated from a conditional FP-tree • This method reduces the search cost. • Algorithm-FP growth
  • 31.
  • 32.
  • 33.
  • 34. Mining frequent item sets using the vertical data format
  • 35. Mining closed and maximum patterns • How can we mine closed frequent item sets? • Strategies included: Item merging Sub-itemset pruning Item skipping • When a new frequent itemset is derived it is necessary to perform two kinds of closure checking: Superset checking Subset checking
  • 36. Pattern Evaluation Methods • Strong rules are not necessarily interesting:
  • 37. Pattern Evaluation Methods • From association analysis to correlation analysis: • Correlation rule: • Correlation measure:
  • 38. Pattern Evaluation Methods: chi-square measure
  • 39. Comparison of pattern evaluation measures • All-confidence • Max_confidence • Kulczynski(kulc) • Cosine • Null Transactions • Null Invariant
  • 40.
  • 41. Advanced pattern mining • What is pattern mining? • Pattern mining: A Road map Basic patterns: frequent pattern, closed pattern, max-pattern, infrequent pattern or rare patterns, negative patterns Based on the abstraction levels involved in a pattern: single-level association rule, multilevel association rules
  • 42. Pattern mining: A Road map Based on the number of dimensions involved in the rule or pattern : Single-dimensional association rule/pattern , Multidimensional association rule/pattern
  • 43. Pattern mining: A Road map • Based on the types of values handled in the rule or pattern: Boolean association rule, quantitative association rule
  • 44. Pattern mining: A Road map • Based on the constraints or criteria used to mine selective patterns:constraint-based,approximate,compressed,near-match,top- k,redundancy-aware top-k • Based on kinds of data and features to be mined: sequential patterns, structural patterns • Based on application domain-specific semantics • Based on data analysis usages: pattern based classification, pattern based clustering
  • 45.
  • 46. Pattern mining in multilevel, multidimensional space • Mining multilevel associations
  • 47. Pattern mining in multilevel, multidimensional space • Using uniform minimum support for all levels • Using reduced minimum support at lower levels
  • 48. Pattern mining in multilevel, multidimensional space • Using item or group-based minimum support
  • 49. Pattern mining in multilevel, multidimensional space • Mining Multidimensional Associations Single dimensional or intradimensional association rules Multi dimensional or interdimensional association rules
  • 50. Pattern mining in multilevel, multidimensional space • Mining quantitative association rules A data cube method A clustering-based method A statistical analysis method to uncover exceptional behaviours
  • 51.
  • 52. Pattern mining in multilevel, multidimensional space • Mining rare patterns and negative patterns
  • 53. Constraint-based frequent pattern mining • It includes the following: Knowledge type constraints, data constraints, dimension/level constraints, Interestingness constraints, Rule constraints • Meta-rule guided mining of association rule • Constraint based pattern generation • An efficient frequent pattern mining processor can prune its search space during mining in two ways: Pruning pattern search space Pruning data search space
  • 54. Constraint-based frequent pattern mining • There are five categories of pattern mining constraints: Antimonotonic Monotonic Succint Convertible In convertible
  • 55. Constraint-based frequent pattern mining • Pruning data space with data pruning constraints Data succinctness Data antimonotocity