Schema on read is obsolete. Welcome metaprogramming..pdf
Mining Association Rules in Large Database
1. Er. Nawaraj Bhandari
Data Warehouse/Data Mining
Mining Association Rules in Large
Databases
Chapter 7:
2. Introduction
Association rule mining finds interesting association or correlation relationships
among a large set of data items.
With massive amounts of data continuously being collected and stored , many
industries are becoming interested in mining association huge amounts of
business transaction records can help in many business decision making
processes, such as catalog design, cross-marketing, and loss-leader analysis.
A typical example of association rule mining is market basket analysis.
3. Association Rules
Analyzes and predicts customer behavior.
If / then statements.
Examples:
Bread=>butter.
If someone purchase bread then he/she likely to purchase butter.
Buys{onions, potatoes}=> buys{tomatoes}
4. Parts of Association Rules
Bread=>butter[20%, 45%]
Bread: Antecedent
Butter: Consequent
20% is Support
And 45% is Confidence
5. Support and Confidence
A=>B
Support denoted probability that contains both A & B
Confidence denotes probability that a transaction
containing A also contains B.
6. Support and Confidence
Consider in a super market
Total transcations: 100
Bread: 20
So ,
20/100 * 100=20% which is support
In 20 transaction of bread, butter : 9 transactions
So, 9/20 * 100=45% which is confidence.
7. Types of Association Rules
Single dimension association rule
Multidimensional association rule
Hybrid association rule
9. Multi dimension association rule
With 2 or more dimensions.
Occupation(I.T), Age(>22)=>buys(laptops)
Here we have 3 dimensions i.e occupation, age limit and buys.
In multidimensional rules we can not duplicate dimension.
10. Hybrid dimension association rule
Dimension or predicates can be repeated.
Time(5 O'clock ), Buy(tea)=>Buy(biscuits)
If a person at 5 o’clock get tea, he or she is likely to get biscuits also.
Here dimensions are repeated.
11. Field of association rule
Web usages mining
Banking
Bio informatics
Market based analysis
Credit/ debit card analysis
Product clustering
Catalog design
13. Apriori Algorithm
If you brought tooth brush, there will be suggestion of tooth paste or if you
brought beer there will be suggestion of chips and potato cracker etc.
Many ecommerce websites are using these trends of suggestion in market. This
is called Apriori Algorithms. This is machine learning algorithms and a lot of
ecommerce websites (like flipcart, amazon) are using this.
18. Apriori Algorithm
L2: (The item set which are frequently repeating using minimum support)
Item Set Support Count
M, K 3
O, K 3
O, E 3
K, E 4
K, Y 3
20. Apriori Algorithm
L3: (The item set which are frequently repeating using minimum support)
Item Set Support Count
O, K, E 3
21. Apriori Algorithm
Now create association rules with support and confidence for O, K, E.
Association rules as like
O AND K GIVES E
Confidence= (support/no of time it occur i.e. O AND K OF O^K=>E)
For example confidence for o and k = (3/3)=1
Association Rule Support Confidence Confidence %
O^K=>E 3 3/3=1 100
O^E=>K 3 3/3=1 100
K^E=>O 3 3/4=0.75 75
E=>O^K 3 3/4=0.75 75
K=>O^E 3 3/5=0.6 60
O=>K^E 3 3/4=0.75 75
22. Apriori Algorithm
Compare this with the minimum confidence 80%
Association Rule Support Confidence Confidence %
O^K=>E 3 3/3=1 100
O^E=>K 3 3/3=1 100
Hence final association rules are:
O^K=>E
O^E=>K
Now this is called market basket analysis.
23. Pros and Cons of Association Rule Mining
Pros
It is an easy-to-implement and easy-to-understand algorithm.
It can be used on large itemsets.
Cons
Sometimes, it may need to find a large number of candidate rules which can be
computationally expensive.
Calculating support is also expensive because it has to go through the entire
database.
June 8, 2019 Data Mining: Concepts and Techniques 23
24. Assignment
Minimum support:2, Minimum confidence:70%. Use Apriori algorithm to get
frequent itemsets and strong association rules.
TID Item
1 I1, I3, I4
2 I2, I3, I5
3 I1, I2, I3, I5
4 I2, I5
25. References
1. Sam Anahory, Dennis Murray, “Data warehousing In the Real World”, Pearson
Education.
2. Kimball, R. “The Data Warehouse Toolkit”, Wiley, 1996.
3. Teorey, T. J., “Database Modeling and Design: The Entity-Relationship Approach”,
Morgan Kaufmann Publishers, Inc., 1990.
4. “An Overview of Data Warehousing and OLAP Technology”, S. Chaudhuri,
Microsoft Research
5. “Data Warehousing with Oracle”, M. A. Shahzad
6. “Data Mining Concepts and Techniques”, Morgan Kaufmann J. Han, M Kamber
Second Edition ISBN : 978-1-55860-901-3