Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Data mining- Association Analysis -market basket
1. A P P L I C A T I O N O F A S S O C I A T I O N M I N I N G I N A N A L Y Z I N G T H E C O N S U M E R
B E H A V I O R B Y M A R K E T B A S K E T T R A N S A C T I O N
13.11.14Association Analysis of Market Basket Transaction
Association Analysis of Market
Basket Transaction
Prepared by-
Sowmiyan Morri
Swapnil Soni
DoMS, IISc
Course-
Data Mining
Instructors-
Prof Parthasarathy
2. 2
Index
13.11.14Association Analysis of Market Basket Transaction
• Visualization of dataset
• Pre-processing of dataset
• Association analysis -3 tasks
Results
Insights
• Classification Vs Association
• Conclusion & Recommendation
For Business
For Business Analyst
3. 3
Visualization of dataset
13.11.14Association Analysis of Market Basket Transaction
Transaction ID
Items
Item-1 Item-2 Item-3 -- Item-70
Acorn Squash Apple Brats Bacon -- Yukon Gold Potatoes Total
1 T F F -- -- 1
2 F F F -- -- 1
3 F F T -- -- 2
4 F F F -- -- 1
5 F F F -- -- 1
6 F T F -- -- 1
7 F F F -- -- 1
8 F F F -- -- 2
9 F F F -- -- 1
10 F F F -- -- 1
11 F F F -- -- 3
12 F F F -- -- 2
13 F F F -- -- 2
14 F F F -- -- 3
15 F F F -- -- 1
-- -- -- -- -- -- 2
1731 -- -- -- -- -- 1
Total 76 38 39 -- 71 3815
Support 4.39% 2.20% 2.25% -- 4.10%
Total no. of Attributes/Items 70
Total no. of Transactions 1731
4. 4
Visualization of dataset
13.11.14Association Analysis of Market Basket Transaction
0
20
40
60
80
100
120
140
160
180
Frequency of Attributes
(Support count of 1-itemset)
Statistics
Range [0,1731]
Average 54.5
Std Deviation 51.4
Min 1
Max 167
Attention:
Maximum support an itemset can have= 167/1731 = 9.6%
0
2
4
6
8
10
12
14
16
T_ID-196
T_ID-633
T_ID-1648
T_ID-1638
T_ID-993
T_ID-203
T_ID-728
T_ID-1145
T_ID-1714
T_ID-254
T_ID-600
T_ID-821
T_ID-1189
T_ID-1431
T_ID-22
T_ID-182
T_ID-332
T_ID-498
T_ID-629
T_ID-794
T_ID-971
T_ID-1123
T_ID-1308
T_ID-1453
T_ID-1603
T_ID-28
T_ID-110
T_ID-180
T_ID-253
T_ID-321
T_ID-393
T_ID-471
T_ID-534
T_ID-591
T_ID-671
T_ID-751
T_ID-820
T_ID-898
T_ID-964
T_ID-1042
T_ID-1107
T_ID-1169
T_ID-1241
T_ID-1300
T_ID-1370
T_ID-1440
T_ID-1502
T_ID-1569
T_ID-1653
T_ID-1697
No. of Items in Transaction
Quite Spars dataset
Pre-processing required!
Statistics
Range [0,70]
Average 2.20
Std Deviation 1.8
Min 1
Max 15
Real motivation-
‘Weka’ failed to handle the dataset!
5. 5
Pre-processing of dataset
13.11.14Association Analysis of Market Basket Transaction
Transaction ID
Items
Item-1 Item-2 Item-3 -- Item-70
Acorn Squash Apple Brats Bacon -- Yukon Gold Potatoes Total
1 T F F -- -- 1
2 F F F -- -- 1
3 F F T -- -- 2
4 F F F -- -- 1
5 F F F -- -- 1
6 F T F -- -- 1
7 F F F -- -- 1
8 F F F -- -- 2
9 F F F -- -- 1
10 F F F -- -- 1
11 F F F -- -- 3
12 F F F -- -- 2
13 F F F -- -- 2
14 F F F -- -- 3
15 F F F -- -- 1
-- -- -- -- -- -- 2
1731 -- -- -- -- -- 1
Total 76 38 39 -- 71 3815
Support 4.39% 2.20% 2.25% -- 4.10%
Total no. of Attributes/Items 70
Total no. of Transactions 1731
Total no. of Attributes/Items with support <2% 34
Total no. of Items after pruning 36
Pruning of attributes below the desired level of support
Logic: Apriori algorithm- If the individual item sets are not frequent than its superset
will also be not frequent
Gain: Calculation & memory reduced by pruning
6. 13.11.14Association Analysis of Market Basket Transaction
Fix the confidence level at 60%. Set the minimum support at 2%, 5%,
10%, 20%, and 50%, run the Apriori algorithm to discover association
rules and summarize your findings.
Task-1
7. 7
Task-1 : Result
13.11.14Association Analysis of Market Basket Transaction
Confidence 60%
Minimum Support 2% 5% 10% 20% 50%
Rules generated 297 22 NA NA NA
Generated sets of large itemsets:
Size of set of large itemsets L(1) 36 18 NA NA NA
Size of set of large itemsets L(2) 37 10 NA NA NA
Size of set of large itemsets L(3) 36 2 NA NA NA
Size of set of large itemsets L(4) 21 NA NA NA NA
Size of set of large itemsets L(5) 5 NA NA NA NA
Total Itemsets 135 30 0 0 0297
22
135
30
2% 5%
Rulesgenerated
Minimum Support
Min Support Vs Rules @ 60% Confidence
Rules generated
Itemsets
Inferences
1. Frequent itemsets can be found only up to 5% of Min Support
2. Number of frequent itemsets reduces with increase in Min Support
3. At the fixed given confidence level no. of Association Rules decreases with decrease in frequent
itemset
8. 8
Task-1: Insights
Top-10 Rules
Antecedent Consequence
1. Butter Earthworm Segments > Black eye peas
2. Black eye peas Blue cheese > Butter
3. Black eye peas Butter > Earthworm Segments
4. Black eye peas > Earthworm Segments
5. Butter > Blue cheese
6. Black eye peas Butter > Blue cheese
7. Chilly Red Flame > Earthworm Segments
8. Blue cheese > Butter
9. Black eye peas Earthworm Segments > Butter
10. Basilisk Tail > Strawberry Essence
13.11.14Association Analysis of Market Basket Transaction
9. 13.11.14Association Analysis of Market Basket Transaction
Fix the minimum support at 2%. Set the confidence level at 90%, 80%,
70%, 60%, and 50%, run the Apriori algorithm to discover association
rules and summarize your findings.
Task-2
10. 10
Task-2 : Result
13.11.14Association Analysis of Market Basket Transaction
Minimum Support 2%
Confidence 90% 80% 70% 60% 50%
Rules generated 134 140 245 297 417
Generated sets of large itemsets:
Size of set of large itemsets L(1) 36 36 36 36 36
Size of set of large itemsets L(2) 37 37 37 37 37
Size of set of large itemsets L(3) 36 36 36 36 36
Size of set of large itemsets L(4) 21 21 21 21 21
Size of set of large itemsets L(5) 5 5 5 5 5
Total 135 135 135 135 135
40% 30% 20% 10% 5%
478 596 734 734 734
36 36 36 36 36
37 37 37 37 37
36 36 36 36 36
21 21 21 21 21
5 5 5 5 5
135 135 135 135 135
134 140
245
297
417
478
596
734 734 734
135 135 135 135 135 135 135 135 135 135
90% 80% 70% 60% 50% 40% 30% 20% 10% 5%
Rulesgenerated
Confidence
Confidence Vs Rules @ 2% Min Support
Rules generated
Itemsets
Inference
1. At the fixed given Min Support no. of Frequent itemsets remains constant irrespective of Confidence
2. No. of Rules increases with decrease in Confidence level
3. Maximum no. of Rules that can be extracted at the given Min Support is 734
11. 11 13.11.14Association Analysis of Market Basket Transaction
Task-2 : Insights
Antecedent Consequence
1. Butter Earthworm Segments > Black eye peas
2. Black eye peas Blue cheese > Butter
3. Chilly Red Flame Black eye peas > Earthworm Segments
4. Garden soil Strawberry Essence > Salamander Skin
5. Basilisk Tail Salamander Skin > Strawberry Essence
6. Blue cheese Earthworm Segments > Black eye peas
7. Blue cheese Earthworm Segments > Butter
8. Butter Blue cheese Earthworm Segments > Black eye peas
9. Black eye peas Blue cheese Earthworm Segments > Butter
10. Blue cheese Earthworm Segments > Black eye peas Butter
12. 13.11.14Association Analysis of Market Basket Transaction
Identify the diary products (milk, cheese etc.) from the items lists and
group them into one binary variable. If a transaction has diary products
replace them (only the diary products) with the binary variable. Use it as
the class label and build a decision tree using ID3 to predict the purchase
of diary products. Compare the rules generated from the decision tree
with those generated earlier. Draw conclusions on the impact of
minimum support and confidence levels.
Task-3
Supervised Learning
Pre-determined Class Attribute: Dairy Product
13. 13
Task-3 : Pre-processing
13.11.14Association Analysis of Market Basket Transaction
Bluecheese Butter ButterCheese EwezerellaCheese FetaCheese JuustoleipaCheese saltedsweetcreambutter VanillaIceCream
Dairy Products (8 No.s)
Total no. of Independent
Attributes
62
Total no. of Transactions 1731
Class Attribute Dairy
Product
Transaction
ID
Attributes Class
AttributesItem-1 Item-2 Item-3 -- Item-62
Acorn Squash Apple Brats Bacon --
Yukon Gold
Potatoes Dairy Product
1 T F F -- -- F
2 F F F -- -- F
3 F F T -- -- F
4 F F F -- -- F
5 F F F -- -- F
6 F T F -- -- F
7 F F F -- -- F
8 F F F -- -- F
9 F F F -- -- F
10 F F F -- -- F
11 F F F -- -- F
12 F F F -- -- F
13 F F F -- -- F
14 F F F -- -- F
15 F F F -- -- F
-- -- -- -- -- -- F
1731 -- -- -- -- -- --
Supervised classification:
ID3 Algorithm applied!
15. 15
Conclusion & Recommendation
Supervised learning- Classification
Large number of ‘Binary’ attributes explodes the huge uninterpretable decision tree.
High conditional decisions: If item-1 is not, item-2 is not…& so on then Dairy product=Yes
Symmetric treatment: ‘Presence’ & ‘Absence’ of an item in a transaction are treated with equal
importance.
13.11.14Association Analysis of Market Basket Transaction
Unsupervised learning- Association
Large number of ‘Binary’ attributes are handled prudently using ‘Min Support’ criteria; Only
qualified attributes/itemsets are considered for analysis
Asymmetric treatment: Only ‘Presence’ of an item in a transaction is of interest.
Simple and interpretable Rules required for market basket transaction to design the market
strategies – ‘Cross selling’.
Comparison between Supervised & Un-supervised learning
Association mining is observed to be better technique for Market Basket Analysis!
For Business Analyst
16. 16
Conclusion & Recommendation
13.11.14Association Analysis of Market Basket Transaction
Good opportunity to maximize revenue by deploying ‘Association Mining!
For Business
• Trivial Associations
Relation among Dairy products- ‘Butter’, ‘Black Eye Peas’, ‘Blue Cheese’- seems to be
obvious as they act as supplements of Vitamin ‘D’.
Relation between ‘Salmander Skin’ & ‘Strawberry Essence’ is observed as they are used
for Salmandar Brandy through Fermentation process.
• Non-trivial Associations
Relation between ‘Garden Soil’ & ‘Strawberry Essence’
Relation between ‘Earthworm’ & ‘Black Eye Peas’
Cross selling
Source: http://www.grailtrail.ndo.co.uk/grails/brandy.html
http://greatist.com/health/18-surprising-dairy-free-sources-calcium