SlideShare a Scribd company logo
1 of 81
Download to read offline
Prof. Pier Luca Lanzi
Association Rules
Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
Prof. Pier Luca Lanzi
Readings
• “Data Mining and Analysis” by Zaki & Meira
§Chapter 8
§Section 9.1
§Chapter 10 up to
Section 10.2.1 included
§Section 12.1
• http://www.dataminingbook.info
2
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Market-Basket Transactions
Bread
Peanuts
Milk
Fruit
Jam
Bread
Jam
Soda
Chips
Milk
Fruit
Steak
Jam
Soda
Chips
Bread
Jam
Soda
Chips
Milk
Bread
Fruit
Soda
Chips
Milk
Jam
Soda
Peanuts
Milk
Fruit
Fruit
Soda
Peanuts
Milk
Fruit
Peanuts
Cheese
Yogurt
4
Prof. Pier Luca Lanzi
Association Rules
5
Prof. Pier Luca Lanzi
What is Association Rule Mining?
• Finding frequent patterns, associations, correlations, or causal
structures among sets of items or objects in transaction
databases, relational databases, and other information repositories
• Applications
§Basket data analysis
§Cross-marketing
§Catalog design
§…
6
Prof. Pier Luca Lanzi
What is Association Rule Mining? 7
• Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other items
in the transaction
• Examples
§{bread} Þ {milk}
§{soda} Þ {chips}
§{bread} Þ {jam}
TID Items
1 Bread, Peanuts, Milk, Fruit, Jam
2 Bread, Jam, Soda, Chips, Milk, Fruit
3 Steak, Jam, Soda, Chips, Bread
4 Jam, Soda, Peanuts, Milk, Fruit
5 Jam, Soda, Chips, Milk, Bread
6 Fruit, Soda, Chips, Milk
7 Fruit, Soda, Peanuts, Milk
8 Fruit, Peanuts, Cheese, Yogurt
Prof. Pier Luca Lanzi
Frequent Itemset
• Itemset
§ A collection of one or more items, e.g., {milk, bread, jam}
§ k-itemset, an itemset that
contains k items
• Support count (s)
§ Frequency of occurrence
of an itemset
§ s({Milk, Bread}) = 3
s({Soda, Chips}) = 4
• Support
§ Fraction of transactions
that contain an itemset
§ s({Milk, Bread}) = 3/8
s({Soda, Chips}) = 4/8
• Frequent Itemset
§ An itemset whose support is greater
than or equal to a minsup threshold
8
TID Items
1 Bread, Peanuts, Milk, Fruit, Jam
2 Bread, Jam, Soda, Chips, Milk, Fruit
3 Steak, Jam, Soda, Chips, Bread
4 Jam, Soda, Peanuts, Milk, Fruit
5 Jam, Soda, Chips, Milk, Bread
6 Fruit, Soda, Chips, Milk
7 Fruit, Soda, Peanuts, Milk
8 Fruit, Peanuts, Cheese, Yogurt
Prof. Pier Luca Lanzi
What is An Association Rule?
• Implication of the form X Þ Y, where X and Y are itemsets
• Example, {bread} Þ {milk}
• Rule Evaluation Metrics, Suppor & Confidence
• Support (s)
§Fraction of transactions
that contain both X and Y
• Confidence (c)
§Measures how often items in Y
appear in transactions that
contain X
9
Prof. Pier Luca Lanzi
What is the Goal?
• Given a set of transactions T, the goal of association rule mining is
to find all rules having
§support ≥ minsup threshold
§confidence ≥ minconf threshold
• Brute-force approach
§List all possible association rules
§Compute the support and confidence for each rule
§Prune rules that fail the minsup and minconf thresholds
• Brute-force approach is computationally prohibitive!
10
Prof. Pier Luca Lanzi
Mining Association Rules
• {Bread, Jam} Þ {Milk} s=0.4 c=0.75
• {Milk, Jam} Þ {Bread} s=0.4 c=0.75
• {Bread} Þ {Milk, Jam} s=0.4 c=0.75
• {Jam} Þ {Bread, Milk} s=0.4 c=0.6
• {Milk} Þ {Bread, Jam} s=0.4 c=0.5
• All the above rules are binary partitions of the same itemset
{Milk, Bread, Jam}
• Rules originating from the same itemset have identical support
but can have different confidence
• We can decouple the support and confidence requirements!
11
Prof. Pier Luca Lanzi
Mining Association Rules
12
Prof. Pier Luca Lanzi
Mining Association Rules:
Two Step Approach
• Frequent Itemset Generation
§Generate all itemsets whose support ³ minsup
• Rule Generation
§Generate high confidence rules from frequent itemset
§Each rule is a binary partitioning of a frequent itemset
• Frequent itemset generation is computationally expensive
13
Prof. Pier Luca Lanzi
Frequent Itemset Generation
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Given d items, there are
2d possible candidate
itemsets
14
Prof. Pier Luca Lanzi
Brute Force 15
Prof. Pier Luca Lanzi
Frequent Itemset Generation
• Brute-force approach:
§Each itemset in the lattice is a candidate frequent itemset
§Count the support of each candidate by scanning the database
§Match each transaction against every candidate
§Complexity ~ O(NMw) => Expensive since M = 2d
TID Items
1 Bread, Peanuts, Milk, Fruit, Jam
2 Bread, Jam, Soda, Chips, Milk, Fruit
3 Steak, Jam, Soda, Chips, Bread
4 Jam, Soda, Peanuts, Milk, Fruit
5 Jam, Soda, Chips, Milk, Bread
6 Fruit, Soda, Chips, Milk
7 Fruit, Soda, Peanuts, Milk
8 Fruit, Peanuts, Cheese, Yogurt
w
N
M
candidates
16
Prof. Pier Luca Lanzi
Computational Complexity
• Given d unique items there are
2d possible itemsets
• Total number of possible
association rules:
• For d=6, there are 602 rules
17
Prof. Pier Luca Lanzi
Frequent Itemset Generation
Strategies
• Reduce the number of candidates (M)
§Complete search has M=2d
§Use pruning techniques to reduce M
• Reduce the number of transactions (N)
§Reduce size of N as the size of itemset increases
• Reduce the number of comparisons (NM)
§Use efficient data structures to store the candidates or
transactions
§No need to match every candidate against every transaction
18
Prof. Pier Luca Lanzi
Reducing the Number of Candidates
• Apriori principle
§If an itemset is frequent, then all of
its subsets must also be frequent
• Apriori principle holds due to the following property of the
support measure:
• Support of an itemset never exceeds the support of its subsets
• This is known as the anti-monotone property of support
19
Prof. Pier Luca Lanzi
Illustrating Apriori Principle
Found to be
Infrequent
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Pruned
supersets
20
Prof. Pier Luca Lanzi
How does the Apriori principle work?
Item Count
Bread
Peanuts
Milk
Fruit
Jam
Soda
Chips
Steak
Cheese
Yogurt
4
4
6
6
5
6
4
1
1
1
2-Itemset Count
Bread, Jam
Peanuts, Fruit
Milk, Fruit
Milk, Jam
Milk, Soda
Fruit, Soda
Jam, Soda
Soda, Chips
4
4
5
4
5
4
4
4
3-Itemset Count
Milk, Fruit, Soda 4
1-itemsets
Minimum Support = 4
2-itemsets
3-itemsets
21
Prof. Pier Luca Lanzi
Example
• Given the following database and a min support of 3, generate all
the frequent itemsets
22
Prof. Pier Luca Lanzi
23
Prof. Pier Luca Lanzi
The Apriori Algorithm
• Let k=1
• Generate frequent itemsets of length 1
• Repeat until no new frequent itemsets are identified
§Generate length (k+1) candidate itemsets from length k
frequent itemsets
§Prune candidate itemsets containing subsets of length k that
are infrequent
§Count the support of each candidate by scanning the DB
§Eliminate candidates that are infrequent, leaving only those
that are frequent
24
Prof. Pier Luca Lanzi
Apriori Algorithm 25
Prof. Pier Luca Lanzi
Apriori Algorithm 26
Prof. Pier Luca Lanzi
Apriori Algorithm Example 27
Prof. Pier Luca Lanzi
Eclat Algorithm
28
Prof. Pier Luca Lanzi
The Eclat Algorithm
• Leverages the tidsets directly for support computation.
• The support of a candidate itemset can be computed by
intersecting the tidsets of suitably chosen subsets.
• Given t(X) and t(Y) for any two frequent itemsets X and Y,
then t(XY)=t(X) ∧ t(Y)
• And sup(XY) = |t(XY)|
29
Prof. Pier Luca Lanzi
Example 30
Prof. Pier Luca Lanzi
31
Prof. Pier Luca Lanzi
Frequent Patterns Mining
Without Candidate Generation
Prof. Pier Luca Lanzi
Apriori’s Performance Bottlenecks
• The core of the Apriori algorithm
§Use frequent (k–1)-itemsets to generate
candidate frequent k-itemsets
§Use database scan and pattern matching
to collect counts for the candidate itemsets
• Huge candidate sets:
§104 frequent 1-itemset will generate
107 candidate 2-itemsets
§To discover a frequent pattern of size 100, e.g.,
{a1, a2, …, a100}, needs to generate 2100 ~1030 candidates.
§Multiple scans of database, it needs (n+1) scans, n is the
length of the longest pattern
33
Prof. Pier Luca Lanzi
Mining Frequent Patterns
Without Candidate Generation
• Compress a large database into a compact, Frequent-Pattern tree
(FP-tree) structure
§Highly condensed, but complete for frequent pattern mining
§Avoid costly database scans
• Use an efficient, FP-tree-based frequent pattern mining method
• A divide-and-conquer methodology: decompose mining tasks into
smaller ones
• Avoid candidate generation: sub-database test only
34
Prof. Pier Luca Lanzi
The FP-Growth Algorithm
• Leave the generate-and-test paradigm of Apriori
• Data sets are encoded using a compact structure, the FP-tree
• Frequent itemsets are extracted directly from the FP-tree
• Major Steps to mine FP-tree
§Construct conditional pattern base
for each node in the FP-tree
§Construct conditional FP-tree
from each conditional pattern-base
§Recursively mine conditional FP-trees and
grow frequent patterns obtained so far
§If the conditional FP-tree contains a single path,
simply enumerate all the patterns
35
Prof. Pier Luca Lanzi
36
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Rule Generation
Prof. Pier Luca Lanzi
Rule Generation
• Given a frequent itemset L, find all non-empty subsets f ⊂ L such
that f → L – f satisfies the minimum confidence requirement
• If {A,B,C,D} is a frequent itemset, candidate rules:
ABC → D, ABD → C, ACD → B, BCD → A, A → BCD,
B → ACD, C → ABD, D → ABC, AB → CD, AC → BD,
AD → BC, BC → AD, BD → AC, CD → AB
• If |L| = k, then there are 2k – 2 candidate association rules
(ignoring L → ⦰ and ⦰ → L)
40
Prof. Pier Luca Lanzi
How to efficiently generate rules
from frequent itemsets?
• Confidence does not have an anti-monotone property
• c(ABC →D) can be larger or smaller than c(AB →D)
• But confidence of rules generated from the same itemset has an
anti-monotone property
• L = {A,B,C,D}: c(ABC → D) ≥ c(AB → CD) ≥ c(A → BCD)
• Confidence is anti-monotone with respect to the number of
items on the right hand side of the rule
41
Prof. Pier Luca Lanzi
Rule Generation for Apriori
Algorithm
ABCD=>{ }
BCD=>A ACD=>B ABD=>C ABC=>D
BC=>ADBD=>ACCD=>AB AD=>BC AC=>BD AB=>CD
D=>ABC C=>ABD B=>ACD A=>BCD
ABCD=>{ }
BCD=>A ACD=>B ABD=>C ABC=>D
BC=>ADBD=>ACCD=>AB AD=>BC AC=>BD AB=>CD
D=>ABC C=>ABD B=>ACD A=>BCD
Pruned
Rules
Low
Confidence
Rule
42
Prof. Pier Luca Lanzi
Rule Generation for Apriori
Algorithm
• Candidate rule is generated by merging two rules that share the
same prefix in the rule consequent
• join(CD=>AB,BD=>AC)
would produce the candidate
rule D => ABC
• Prune rule D=>ABC if its
subset AD=>BC does not have
high confidence
BD=>ACCD=>AB
D=>ABC
43
Prof. Pier Luca Lanzi
44
Prof. Pier Luca Lanzi
Effect of Support Distribution
• Many real data sets have skewed support distribution
Support
distribution of
a retail data set
45
Prof. Pier Luca Lanzi
How to Set an Appropriate Minsup?
• If minsup is set too high, we could miss itemsets involving
interesting rare items (e.g., expensive products)
• If minsup is set too low, it is computationally expensive
and the number of itemsets is very large
• A single minimum support threshold may not be effective
46
Prof. Pier Luca Lanzi
Association Rule Mining in R
library(arulesViz)
library(arules)
data("Adult")
rules <- apriori(Adult,
parameter = list(supp = 0.8, conf = 0.9,target = "rules"))
summary(rules)
inspect(rules)
plot(rules)
plot(rules, method="graph", control=list(type="items"))
47
Prof. Pier Luca Lanzi
FP-growth vs. Apriori:
Scalability with the Support Threshold
0
10
20
30
40
50
60
70
80
90
100
0 0.5 1 1.5 2 2.5 3
Support threshold(%)
Runtime(sec.)
D1 FP-grow th runtime
D1 Apriori runtime
48
Prof. Pier Luca Lanzi
Benefits of the FP-Tree Structure
• Completeness
§Preserve complete information
for frequent pattern mining
§Never break a long pattern of any transaction
• Compactness
§Reduce irrelevant info—infrequent items are gone
§Items in frequency descending order: the more frequently
occurring, the more likely to be shared
§Never be larger than the original database (not count node-
links and the count field)
§For Connect-4 DB, compression ratio could be over 100
49
Prof. Pier Luca Lanzi
Why is FP-Growth the Winner?
• Divide-and-conquer:
§decompose both the mining task and DB
according to the frequent patterns obtained so far
§leads to focused search of smaller databases
• Other factors
§No candidate generation, no candidate test
§Compressed database: FP-tree structure
§No repeated scan of entire database
§Basic ops—counting local freq items and building
sub FP-tree, no pattern search and matching
50
Prof. Pier Luca Lanzi
Rule Assessment Measures
Prof. Pier Luca Lanzi
Lift
• Lift is the ratio of the observed joint probability of X and Y to the expected
joint probability if they were statistically independent,
lift(X → Y) = sup(XUY)/sup(X)sup(Y) = conf(X → Y)/sup(Y)
• Lift is a measure of the deviation from stochastic independence (if it is 1 then
X and Y are independent)
• Lift also measure the surprise of the rule. A lift close to 1means that the
support of a rule is expected considering the supports of its components.
• We typically look for values of lift that are much larger (i.e., above expectation)
or much smaller (i.e., below expectation) than one.
52
Prof. Pier Luca Lanzi
Leverage
• Leverage is computed as the difference between the observed
and expected joint probability of XY assuming that X and Y are
independent
leverage(X → Y) = sup(XUY) - sup(X)sup(Y)
53
Prof. Pier Luca Lanzi
Summarizing Itemsets
Prof. Pier Luca Lanzi
Maximal Frequent Itemsets
• A frequent itemset X is called maximal if it has no frequent supersets
• The set of all maximal frequent itemsets, given as
M = {X | X ∈ F and !∃Y ⊃ X, such that Y ∈ F }
• M is a condensed representation of the set of all frequent itemset F, because
we can determine whether any itemset is frequent or not using M
• If there is a maximal itemset Z such that X ⊆ Z, then X must be frequent,
otherwise X cannot be frequent
• However, M alone cannot be used to determine sup(X), we can only use to
have a lower-bound, that is, sup(X) ≥ sup(Z) if X ⊆ Z ∈M.
55
Prof. Pier Luca Lanzi
Closed Frequent Itemsets
• An itemset X is closed if all supersets of X have strictly less
support, that is,
sup(X) > sup(Y), for all Y ⊃ X
• The set of all closed frequent itemsets C is a condensed
representation, as we can determine whether an itemset X is
frequent, as well as the exact support of X using C alone
56
Prof. Pier Luca Lanzi
Minimal Generators
• A frequent itemset X is a minimal generator if it has no subsets
with the same support
G = { X | X ∈ F and !∃Y ⊂ X, such that sup(X) = sup(Y) }
• Thus, all subsets of X have strictly higher support, that is,
sup(X) < sup(Y)
57
Prof. Pier Luca Lanzi
58
Prof. Pier Luca Lanzi
Shaded boxes are closed itemsets
Simple boxes are generators
Shaded boxes with double lines are maximal itemsets
Prof. Pier Luca Lanzi
Mining Frequent Sequences
Prof. Pier Luca Lanzi
Mining Frequent Sequences
Prof. Pier Luca Lanzi
Sequential Pattern Mining
• Association rule mining does not consider
the order of transactions
• In many applications such orderings are significant
• In market basket analysis, it is interesting to know whether people
buy some items in sequence
• Example: buying bed first and then bed sheets later
• In Web usage mining, it is useful to find navigational patterns of
users in a Web site from sequences of page visits of users
62
Prof. Pier Luca Lanzi
Examples of Sequence
• Web sequence
§< {Homepage} {Electronics} {Digital Cameras} {Canon Digital
Camera} {Shopping Cart} {Order Confirmation} {Return to
Shopping} >
• Sequence of initiating events causing the accident at 3-mile Island:
§<{clogged resin} {outlet valve closure} {loss of feedwater}
{condenser polisher outlet valve shut} {booster pumps trip} {main
waterpump trips} {main turbine trips} {reactor pressure increases}>
• Sequence of books checked out at a library:
§<{Fellowship of the Ring} {The Two Towers}
{Return of the King}>
63
Prof. Pier Luca Lanzi
Basic Concepts
• Let I = {i1, i2, …, im} be a set of items.
• A sequence is an ordered list of itemsets.
• We denote a sequence s by <a1, a2, …, ar>
• An element (or itemset of a sequence is denoted by
{x1, x2, …, xk} where xi is a subset of {i1, i2, …, im}
• We assume without loss of generality that items in an element of
a sequence are in lexicographic order
64
Prof. Pier Luca Lanzi
Basic concepts
• The size of a sequence is the number of elements (or itemsets) in
the sequence
• The length of a sequence is the number of items in the sequence
(a sequence of length k is called k-sequence)
• Given two sequence s1 = <a1, a2, …, ar> and
s2 = <b1, b2, …, bv> we say that s1 is a subsequnce of s2
§If there exists a set of integers 1 ≤ j1 < j2 < … < jr-1 < jr ≤ v
such that for all i, ai ⊂bji
•
65
Prof. Pier Luca Lanzi
Frequent Sequences
• Given a database D {s1, …, sN} of N sequences and a sequence r,
we define the support count of r as
sup(r) = | {si | r is a subsequence of si}|
• And the support (or relative support),
rsup(r) = sup(r)/N
66
Prof. Pier Luca Lanzi
Example
• Let I = {1, 2, 3, 4, 5, 6, 7, 8, 9}.
• The sequence {3}{4, 5}{8} is contained in
• (or is a subsequence of) {6} {3, 7}{9}{4, 5, 8}{3, 8}
• In fact, {3} {3, 7}, {4, 5} {4, 5, 8}, and {8} {3, 8}
• However, {3}{8} is not contained in {3, 8} or vice versa
• The size of the sequence {3}{4, 5}{8} is 3, and the length of the
sequence is 4.
67
Prof. Pier Luca Lanzi
Goal of Sequential Pattern Mining
• The input is a set S of input data sequences
(or sequence database)
• The problem of mining sequential patterns is to find all the
sequences that have a user-specified minimum support
• Each such sequence is called a frequent sequence,
or a sequential pattern
• The support for a sequence is the fraction of total data sequences
in S that contains this sequence
68
Prof. Pier Luca Lanzi
Terminology
• Itemset
§Non-empty set of items
§Each itemset is mapped to an integer
• Sequence
§Ordered list of itemsets
• Support for a Sequence
§Fraction of total customers that support a sequence.
• Maximal Sequence
§A sequence that is not contained in any other sequence
• Large Sequence
§Sequence that meets minisup
69
Prof. Pier Luca Lanzi
Example
• Given the following sequence database
• With a minsup of 3, the set of frequent subsequences is
§A(3), G(3), T(3)
§AA(3), AG(3), GA(3), GG(3)
§AAG(3), GAA(3), GAG(3)
§GAAG(3)
70
Prof. Pier Luca Lanzi
The GSP Algorithm 71
Prof. Pier Luca Lanzi
The GSP Algorithm 72
Prof. Pier Luca Lanzi
The GSP Algorithm 73
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Example
Customer
ID
Transaction
Time
Items
Bought
1 June 25 '93 30
1 June 30 '93 90
2 June 10 '93 10,20
2 June 15 '93 30
2 June 20 '93 40,60,70
3 June 25 '93 30,50,70
4 June 25 '93 30
4 June 30 '93 40,70
4 July 25 '93 90
5 June 12 '93 90
Customer ID Customer Sequence
1 < (30) (90) >
2 < (10 20) (30) (40 60 70) >
3 < (30) (50) (70) >
4 < (30) (40 70) (90) >
5 < (90) >
Maximal seq with support > 40%
< (30) (90) >
< (30) (40 70) >
Note: Use Minisup of 40%, no less than two customers must support the sequence
< (10 20) (30) > Does not have enough support (Only by Customer #2)
< (30) >, < (70) >, < (30) (40) > … are not maximal.
75
Prof. Pier Luca Lanzi
Association Rules for Classification
76
Prof. Pier Luca Lanzi
Mining Association Rules for
Classification (the CBA algorithm)
• Association rule mining assumes that the data consist of a set of
transactions. Thus, the typical tabular representation of data used
in classification must be mapped into such a format.
• Association rule mining is then applied to the new dataset and
the search is focused on association rules in which the tail
identifies a class label
X → ci (where ci is a class label)
• The association rules are pruned using the pessimistic error-based
method used in C4.5
• Finally, rules are sorted to build the final classifier.
77
Prof. Pier Luca Lanzi
JRip Model
(humidity = high) and (outlook = sunny) =>
play=no (3.0/0.0)
(outlook = rainy) and (windy = TRUE) =>
play=no (2.0/0.0)
=> play=yes (9.0/0.0)
78
Prof. Pier Luca Lanzi
One Rule Model
outlook:
overcast -> yes
rainy-> yes
sunny-> no
(10/14 instances correct)
79
Prof. Pier Luca Lanzi
CBA Model
outlook=overcast ==> play=yes
humidity=normal windy=FALSE ==> play=yes
outlook=rainy windy=FALSE ==> play=yes
outlook=sunny humidity=high ==> play=no
outlook=rainy windy=TRUE ==> play=no
(default class is the majority class)
80
Prof. Pier Luca Lanzi
Run the KNIME workflow and the
Python notebooks for this lecture

More Related Content

What's hot

Brief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNsBrief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNsAhmed Gad
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Databricks
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMDivya Gera
 
Causal discovery and prediction mechanisms
Causal discovery and prediction mechanismsCausal discovery and prediction mechanisms
Causal discovery and prediction mechanismsShiga University, RIKEN
 
Introduction to artificial neural network
Introduction to artificial neural networkIntroduction to artificial neural network
Introduction to artificial neural networkDr. C.V. Suresh Babu
 
Multi Objective Optimization and Pareto Multi Objective Optimization with cas...
Multi Objective Optimization and Pareto Multi Objective Optimization with cas...Multi Objective Optimization and Pareto Multi Objective Optimization with cas...
Multi Objective Optimization and Pareto Multi Objective Optimization with cas...Aditya Deshpande
 
Parallel convolutional neural network
Parallel  convolutional neural networkParallel  convolutional neural network
Parallel convolutional neural networkAbdullah Khan Zehady
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 ISungbin Lim
 
Neural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear RegressionNeural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear RegressionMostafa G. M. Mostafa
 
Autonomous Driving and Reinforcement Learning - an Introduction
Autonomous Driving and Reinforcement Learning - an IntroductionAutonomous Driving and Reinforcement Learning - an Introduction
Autonomous Driving and Reinforcement Learning - an IntroductionMichael Bosello
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation LearningJure Leskovec
 
MACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMMACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMPuneet Kulyana
 
Lecture on classical and quantum information
Lecture on classical and quantum informationLecture on classical and quantum information
Lecture on classical and quantum informationKrzysztof Pomorski
 
Probabilistic Reasoning
Probabilistic Reasoning Probabilistic Reasoning
Probabilistic Reasoning Sushant Gautam
 
딥뉴럴넷 클러스터링 실패기
딥뉴럴넷 클러스터링 실패기딥뉴럴넷 클러스터링 실패기
딥뉴럴넷 클러스터링 실패기Myeongju Kim
 

What's hot (20)

Brief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNsBrief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNs
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
 
Fuzzy c means manual work
Fuzzy c means manual workFuzzy c means manual work
Fuzzy c means manual work
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
Causal discovery and prediction mechanisms
Causal discovery and prediction mechanismsCausal discovery and prediction mechanisms
Causal discovery and prediction mechanisms
 
Particle filter
Particle filterParticle filter
Particle filter
 
Shortest Path in Graph
Shortest Path in GraphShortest Path in Graph
Shortest Path in Graph
 
Introduction to artificial neural network
Introduction to artificial neural networkIntroduction to artificial neural network
Introduction to artificial neural network
 
Multi Objective Optimization and Pareto Multi Objective Optimization with cas...
Multi Objective Optimization and Pareto Multi Objective Optimization with cas...Multi Objective Optimization and Pareto Multi Objective Optimization with cas...
Multi Objective Optimization and Pareto Multi Objective Optimization with cas...
 
Parallel convolutional neural network
Parallel  convolutional neural networkParallel  convolutional neural network
Parallel convolutional neural network
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 I
 
Neural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear RegressionNeural Networks: Model Building Through Linear Regression
Neural Networks: Model Building Through Linear Regression
 
Autonomous Driving and Reinforcement Learning - an Introduction
Autonomous Driving and Reinforcement Learning - an IntroductionAutonomous Driving and Reinforcement Learning - an Introduction
Autonomous Driving and Reinforcement Learning - an Introduction
 
Graph Representation Learning
Graph Representation LearningGraph Representation Learning
Graph Representation Learning
 
MACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMMACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHM
 
Lecture on classical and quantum information
Lecture on classical and quantum informationLecture on classical and quantum information
Lecture on classical and quantum information
 
Scale invariant feature transform
Scale invariant feature transformScale invariant feature transform
Scale invariant feature transform
 
Probabilistic Reasoning
Probabilistic Reasoning Probabilistic Reasoning
Probabilistic Reasoning
 
딥뉴럴넷 클러스터링 실패기
딥뉴럴넷 클러스터링 실패기딥뉴럴넷 클러스터링 실패기
딥뉴럴넷 클러스터링 실패기
 
Restricted boltzmann machine
Restricted boltzmann machineRestricted boltzmann machine
Restricted boltzmann machine
 

Similar to DMTM Lecture 16 Association rules

DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesPier Luca Lanzi
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptraju980973
 
Lecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsLecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsPier Luca Lanzi
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningWan Aezwani Wab
 
AssociationRule.pdf
AssociationRule.pdfAssociationRule.pdf
AssociationRule.pdfWailaBaba
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14rahulmath80
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data miningSulman Ahmed
 
UNIT 3.2 -Mining Frquent Patterns (part1).ppt
UNIT 3.2 -Mining Frquent Patterns (part1).pptUNIT 3.2 -Mining Frquent Patterns (part1).ppt
UNIT 3.2 -Mining Frquent Patterns (part1).pptRaviKiranVarma4
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxssuser957b41
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmhktripathy
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Subrata Kumer Paul
 
Mining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesMining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesRashmi Bhat
 

Similar to DMTM Lecture 16 Association rules (20)

DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association Rules
 
DM -Unit 2-PPT.ppt
DM -Unit 2-PPT.pptDM -Unit 2-PPT.ppt
DM -Unit 2-PPT.ppt
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Lecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsLecture 04 Association Rules Basics
Lecture 04 Association Rules Basics
 
Apriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule MiningApriori and Eclat algorithm in Association Rule Mining
Apriori and Eclat algorithm in Association Rule Mining
 
AssociationRule.pdf
AssociationRule.pdfAssociationRule.pdf
AssociationRule.pdf
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
UNIT 3.2 -Mining Frquent Patterns (part1).ppt
UNIT 3.2 -Mining Frquent Patterns (part1).pptUNIT 3.2 -Mining Frquent Patterns (part1).ppt
UNIT 3.2 -Mining Frquent Patterns (part1).ppt
 
6 module 4
6 module 46 module 4
6 module 4
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptx
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
apriori.pptx
apriori.pptxapriori.pptx
apriori.pptx
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
06 fp basic
06 fp basic06 fp basic
06 fp basic
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
06FPBasic02.pdf
06FPBasic02.pdf06FPBasic02.pdf
06FPBasic02.pdf
 
Mining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association RulesMining Frequent Patterns And Association Rules
Mining Frequent Patterns And Association Rules
 

More from Pier Luca Lanzi

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i VideogiochiPier Luca Lanzi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiPier Luca Lanzi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomePier Luca Lanzi
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaPier Luca Lanzi
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Pier Luca Lanzi
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationPier Luca Lanzi
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationPier Luca Lanzi
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningPier Luca Lanzi
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningPier Luca Lanzi
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationPier Luca Lanzi
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringPier Luca Lanzi
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringPier Luca Lanzi
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringPier Luca Lanzi
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringPier Luca Lanzi
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesPier Luca Lanzi
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsPier Luca Lanzi
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesPier Luca Lanzi
 
DMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesDMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesPier Luca Lanzi
 

More from Pier Luca Lanzi (20)

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei Videogiochi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning Welcome
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di apertura
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data exploration
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph mining
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text mining
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluation
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clustering
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clustering
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clustering
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 Clustering
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensembles
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethods
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rules
 
DMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesDMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision trees
 

Recently uploaded

This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 

Recently uploaded (20)

This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 

DMTM Lecture 16 Association rules

  • 1. Prof. Pier Luca Lanzi Association Rules Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
  • 2. Prof. Pier Luca Lanzi Readings • “Data Mining and Analysis” by Zaki & Meira §Chapter 8 §Section 9.1 §Chapter 10 up to Section 10.2.1 included §Section 12.1 • http://www.dataminingbook.info 2
  • 4. Prof. Pier Luca Lanzi Market-Basket Transactions Bread Peanuts Milk Fruit Jam Bread Jam Soda Chips Milk Fruit Steak Jam Soda Chips Bread Jam Soda Chips Milk Bread Fruit Soda Chips Milk Jam Soda Peanuts Milk Fruit Fruit Soda Peanuts Milk Fruit Peanuts Cheese Yogurt 4
  • 5. Prof. Pier Luca Lanzi Association Rules 5
  • 6. Prof. Pier Luca Lanzi What is Association Rule Mining? • Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories • Applications §Basket data analysis §Cross-marketing §Catalog design §… 6
  • 7. Prof. Pier Luca Lanzi What is Association Rule Mining? 7 • Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction • Examples §{bread} Þ {milk} §{soda} Þ {chips} §{bread} Þ {jam} TID Items 1 Bread, Peanuts, Milk, Fruit, Jam 2 Bread, Jam, Soda, Chips, Milk, Fruit 3 Steak, Jam, Soda, Chips, Bread 4 Jam, Soda, Peanuts, Milk, Fruit 5 Jam, Soda, Chips, Milk, Bread 6 Fruit, Soda, Chips, Milk 7 Fruit, Soda, Peanuts, Milk 8 Fruit, Peanuts, Cheese, Yogurt
  • 8. Prof. Pier Luca Lanzi Frequent Itemset • Itemset § A collection of one or more items, e.g., {milk, bread, jam} § k-itemset, an itemset that contains k items • Support count (s) § Frequency of occurrence of an itemset § s({Milk, Bread}) = 3 s({Soda, Chips}) = 4 • Support § Fraction of transactions that contain an itemset § s({Milk, Bread}) = 3/8 s({Soda, Chips}) = 4/8 • Frequent Itemset § An itemset whose support is greater than or equal to a minsup threshold 8 TID Items 1 Bread, Peanuts, Milk, Fruit, Jam 2 Bread, Jam, Soda, Chips, Milk, Fruit 3 Steak, Jam, Soda, Chips, Bread 4 Jam, Soda, Peanuts, Milk, Fruit 5 Jam, Soda, Chips, Milk, Bread 6 Fruit, Soda, Chips, Milk 7 Fruit, Soda, Peanuts, Milk 8 Fruit, Peanuts, Cheese, Yogurt
  • 9. Prof. Pier Luca Lanzi What is An Association Rule? • Implication of the form X Þ Y, where X and Y are itemsets • Example, {bread} Þ {milk} • Rule Evaluation Metrics, Suppor & Confidence • Support (s) §Fraction of transactions that contain both X and Y • Confidence (c) §Measures how often items in Y appear in transactions that contain X 9
  • 10. Prof. Pier Luca Lanzi What is the Goal? • Given a set of transactions T, the goal of association rule mining is to find all rules having §support ≥ minsup threshold §confidence ≥ minconf threshold • Brute-force approach §List all possible association rules §Compute the support and confidence for each rule §Prune rules that fail the minsup and minconf thresholds • Brute-force approach is computationally prohibitive! 10
  • 11. Prof. Pier Luca Lanzi Mining Association Rules • {Bread, Jam} Þ {Milk} s=0.4 c=0.75 • {Milk, Jam} Þ {Bread} s=0.4 c=0.75 • {Bread} Þ {Milk, Jam} s=0.4 c=0.75 • {Jam} Þ {Bread, Milk} s=0.4 c=0.6 • {Milk} Þ {Bread, Jam} s=0.4 c=0.5 • All the above rules are binary partitions of the same itemset {Milk, Bread, Jam} • Rules originating from the same itemset have identical support but can have different confidence • We can decouple the support and confidence requirements! 11
  • 12. Prof. Pier Luca Lanzi Mining Association Rules 12
  • 13. Prof. Pier Luca Lanzi Mining Association Rules: Two Step Approach • Frequent Itemset Generation §Generate all itemsets whose support ³ minsup • Rule Generation §Generate high confidence rules from frequent itemset §Each rule is a binary partitioning of a frequent itemset • Frequent itemset generation is computationally expensive 13
  • 14. Prof. Pier Luca Lanzi Frequent Itemset Generation null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Given d items, there are 2d possible candidate itemsets 14
  • 15. Prof. Pier Luca Lanzi Brute Force 15
  • 16. Prof. Pier Luca Lanzi Frequent Itemset Generation • Brute-force approach: §Each itemset in the lattice is a candidate frequent itemset §Count the support of each candidate by scanning the database §Match each transaction against every candidate §Complexity ~ O(NMw) => Expensive since M = 2d TID Items 1 Bread, Peanuts, Milk, Fruit, Jam 2 Bread, Jam, Soda, Chips, Milk, Fruit 3 Steak, Jam, Soda, Chips, Bread 4 Jam, Soda, Peanuts, Milk, Fruit 5 Jam, Soda, Chips, Milk, Bread 6 Fruit, Soda, Chips, Milk 7 Fruit, Soda, Peanuts, Milk 8 Fruit, Peanuts, Cheese, Yogurt w N M candidates 16
  • 17. Prof. Pier Luca Lanzi Computational Complexity • Given d unique items there are 2d possible itemsets • Total number of possible association rules: • For d=6, there are 602 rules 17
  • 18. Prof. Pier Luca Lanzi Frequent Itemset Generation Strategies • Reduce the number of candidates (M) §Complete search has M=2d §Use pruning techniques to reduce M • Reduce the number of transactions (N) §Reduce size of N as the size of itemset increases • Reduce the number of comparisons (NM) §Use efficient data structures to store the candidates or transactions §No need to match every candidate against every transaction 18
  • 19. Prof. Pier Luca Lanzi Reducing the Number of Candidates • Apriori principle §If an itemset is frequent, then all of its subsets must also be frequent • Apriori principle holds due to the following property of the support measure: • Support of an itemset never exceeds the support of its subsets • This is known as the anti-monotone property of support 19
  • 20. Prof. Pier Luca Lanzi Illustrating Apriori Principle Found to be Infrequent null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE null AB AC AD AE BC BD BE CD CE DE A B C D E ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Pruned supersets 20
  • 21. Prof. Pier Luca Lanzi How does the Apriori principle work? Item Count Bread Peanuts Milk Fruit Jam Soda Chips Steak Cheese Yogurt 4 4 6 6 5 6 4 1 1 1 2-Itemset Count Bread, Jam Peanuts, Fruit Milk, Fruit Milk, Jam Milk, Soda Fruit, Soda Jam, Soda Soda, Chips 4 4 5 4 5 4 4 4 3-Itemset Count Milk, Fruit, Soda 4 1-itemsets Minimum Support = 4 2-itemsets 3-itemsets 21
  • 22. Prof. Pier Luca Lanzi Example • Given the following database and a min support of 3, generate all the frequent itemsets 22
  • 23. Prof. Pier Luca Lanzi 23
  • 24. Prof. Pier Luca Lanzi The Apriori Algorithm • Let k=1 • Generate frequent itemsets of length 1 • Repeat until no new frequent itemsets are identified §Generate length (k+1) candidate itemsets from length k frequent itemsets §Prune candidate itemsets containing subsets of length k that are infrequent §Count the support of each candidate by scanning the DB §Eliminate candidates that are infrequent, leaving only those that are frequent 24
  • 25. Prof. Pier Luca Lanzi Apriori Algorithm 25
  • 26. Prof. Pier Luca Lanzi Apriori Algorithm 26
  • 27. Prof. Pier Luca Lanzi Apriori Algorithm Example 27
  • 28. Prof. Pier Luca Lanzi Eclat Algorithm 28
  • 29. Prof. Pier Luca Lanzi The Eclat Algorithm • Leverages the tidsets directly for support computation. • The support of a candidate itemset can be computed by intersecting the tidsets of suitably chosen subsets. • Given t(X) and t(Y) for any two frequent itemsets X and Y, then t(XY)=t(X) ∧ t(Y) • And sup(XY) = |t(XY)| 29
  • 30. Prof. Pier Luca Lanzi Example 30
  • 31. Prof. Pier Luca Lanzi 31
  • 32. Prof. Pier Luca Lanzi Frequent Patterns Mining Without Candidate Generation
  • 33. Prof. Pier Luca Lanzi Apriori’s Performance Bottlenecks • The core of the Apriori algorithm §Use frequent (k–1)-itemsets to generate candidate frequent k-itemsets §Use database scan and pattern matching to collect counts for the candidate itemsets • Huge candidate sets: §104 frequent 1-itemset will generate 107 candidate 2-itemsets §To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100}, needs to generate 2100 ~1030 candidates. §Multiple scans of database, it needs (n+1) scans, n is the length of the longest pattern 33
  • 34. Prof. Pier Luca Lanzi Mining Frequent Patterns Without Candidate Generation • Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure §Highly condensed, but complete for frequent pattern mining §Avoid costly database scans • Use an efficient, FP-tree-based frequent pattern mining method • A divide-and-conquer methodology: decompose mining tasks into smaller ones • Avoid candidate generation: sub-database test only 34
  • 35. Prof. Pier Luca Lanzi The FP-Growth Algorithm • Leave the generate-and-test paradigm of Apriori • Data sets are encoded using a compact structure, the FP-tree • Frequent itemsets are extracted directly from the FP-tree • Major Steps to mine FP-tree §Construct conditional pattern base for each node in the FP-tree §Construct conditional FP-tree from each conditional pattern-base §Recursively mine conditional FP-trees and grow frequent patterns obtained so far §If the conditional FP-tree contains a single path, simply enumerate all the patterns 35
  • 36. Prof. Pier Luca Lanzi 36
  • 39. Prof. Pier Luca Lanzi Rule Generation
  • 40. Prof. Pier Luca Lanzi Rule Generation • Given a frequent itemset L, find all non-empty subsets f ⊂ L such that f → L – f satisfies the minimum confidence requirement • If {A,B,C,D} is a frequent itemset, candidate rules: ABC → D, ABD → C, ACD → B, BCD → A, A → BCD, B → ACD, C → ABD, D → ABC, AB → CD, AC → BD, AD → BC, BC → AD, BD → AC, CD → AB • If |L| = k, then there are 2k – 2 candidate association rules (ignoring L → ⦰ and ⦰ → L) 40
  • 41. Prof. Pier Luca Lanzi How to efficiently generate rules from frequent itemsets? • Confidence does not have an anti-monotone property • c(ABC →D) can be larger or smaller than c(AB →D) • But confidence of rules generated from the same itemset has an anti-monotone property • L = {A,B,C,D}: c(ABC → D) ≥ c(AB → CD) ≥ c(A → BCD) • Confidence is anti-monotone with respect to the number of items on the right hand side of the rule 41
  • 42. Prof. Pier Luca Lanzi Rule Generation for Apriori Algorithm ABCD=>{ } BCD=>A ACD=>B ABD=>C ABC=>D BC=>ADBD=>ACCD=>AB AD=>BC AC=>BD AB=>CD D=>ABC C=>ABD B=>ACD A=>BCD ABCD=>{ } BCD=>A ACD=>B ABD=>C ABC=>D BC=>ADBD=>ACCD=>AB AD=>BC AC=>BD AB=>CD D=>ABC C=>ABD B=>ACD A=>BCD Pruned Rules Low Confidence Rule 42
  • 43. Prof. Pier Luca Lanzi Rule Generation for Apriori Algorithm • Candidate rule is generated by merging two rules that share the same prefix in the rule consequent • join(CD=>AB,BD=>AC) would produce the candidate rule D => ABC • Prune rule D=>ABC if its subset AD=>BC does not have high confidence BD=>ACCD=>AB D=>ABC 43
  • 44. Prof. Pier Luca Lanzi 44
  • 45. Prof. Pier Luca Lanzi Effect of Support Distribution • Many real data sets have skewed support distribution Support distribution of a retail data set 45
  • 46. Prof. Pier Luca Lanzi How to Set an Appropriate Minsup? • If minsup is set too high, we could miss itemsets involving interesting rare items (e.g., expensive products) • If minsup is set too low, it is computationally expensive and the number of itemsets is very large • A single minimum support threshold may not be effective 46
  • 47. Prof. Pier Luca Lanzi Association Rule Mining in R library(arulesViz) library(arules) data("Adult") rules <- apriori(Adult, parameter = list(supp = 0.8, conf = 0.9,target = "rules")) summary(rules) inspect(rules) plot(rules) plot(rules, method="graph", control=list(type="items")) 47
  • 48. Prof. Pier Luca Lanzi FP-growth vs. Apriori: Scalability with the Support Threshold 0 10 20 30 40 50 60 70 80 90 100 0 0.5 1 1.5 2 2.5 3 Support threshold(%) Runtime(sec.) D1 FP-grow th runtime D1 Apriori runtime 48
  • 49. Prof. Pier Luca Lanzi Benefits of the FP-Tree Structure • Completeness §Preserve complete information for frequent pattern mining §Never break a long pattern of any transaction • Compactness §Reduce irrelevant info—infrequent items are gone §Items in frequency descending order: the more frequently occurring, the more likely to be shared §Never be larger than the original database (not count node- links and the count field) §For Connect-4 DB, compression ratio could be over 100 49
  • 50. Prof. Pier Luca Lanzi Why is FP-Growth the Winner? • Divide-and-conquer: §decompose both the mining task and DB according to the frequent patterns obtained so far §leads to focused search of smaller databases • Other factors §No candidate generation, no candidate test §Compressed database: FP-tree structure §No repeated scan of entire database §Basic ops—counting local freq items and building sub FP-tree, no pattern search and matching 50
  • 51. Prof. Pier Luca Lanzi Rule Assessment Measures
  • 52. Prof. Pier Luca Lanzi Lift • Lift is the ratio of the observed joint probability of X and Y to the expected joint probability if they were statistically independent, lift(X → Y) = sup(XUY)/sup(X)sup(Y) = conf(X → Y)/sup(Y) • Lift is a measure of the deviation from stochastic independence (if it is 1 then X and Y are independent) • Lift also measure the surprise of the rule. A lift close to 1means that the support of a rule is expected considering the supports of its components. • We typically look for values of lift that are much larger (i.e., above expectation) or much smaller (i.e., below expectation) than one. 52
  • 53. Prof. Pier Luca Lanzi Leverage • Leverage is computed as the difference between the observed and expected joint probability of XY assuming that X and Y are independent leverage(X → Y) = sup(XUY) - sup(X)sup(Y) 53
  • 54. Prof. Pier Luca Lanzi Summarizing Itemsets
  • 55. Prof. Pier Luca Lanzi Maximal Frequent Itemsets • A frequent itemset X is called maximal if it has no frequent supersets • The set of all maximal frequent itemsets, given as M = {X | X ∈ F and !∃Y ⊃ X, such that Y ∈ F } • M is a condensed representation of the set of all frequent itemset F, because we can determine whether any itemset is frequent or not using M • If there is a maximal itemset Z such that X ⊆ Z, then X must be frequent, otherwise X cannot be frequent • However, M alone cannot be used to determine sup(X), we can only use to have a lower-bound, that is, sup(X) ≥ sup(Z) if X ⊆ Z ∈M. 55
  • 56. Prof. Pier Luca Lanzi Closed Frequent Itemsets • An itemset X is closed if all supersets of X have strictly less support, that is, sup(X) > sup(Y), for all Y ⊃ X • The set of all closed frequent itemsets C is a condensed representation, as we can determine whether an itemset X is frequent, as well as the exact support of X using C alone 56
  • 57. Prof. Pier Luca Lanzi Minimal Generators • A frequent itemset X is a minimal generator if it has no subsets with the same support G = { X | X ∈ F and !∃Y ⊂ X, such that sup(X) = sup(Y) } • Thus, all subsets of X have strictly higher support, that is, sup(X) < sup(Y) 57
  • 58. Prof. Pier Luca Lanzi 58
  • 59. Prof. Pier Luca Lanzi Shaded boxes are closed itemsets Simple boxes are generators Shaded boxes with double lines are maximal itemsets
  • 60. Prof. Pier Luca Lanzi Mining Frequent Sequences
  • 61. Prof. Pier Luca Lanzi Mining Frequent Sequences
  • 62. Prof. Pier Luca Lanzi Sequential Pattern Mining • Association rule mining does not consider the order of transactions • In many applications such orderings are significant • In market basket analysis, it is interesting to know whether people buy some items in sequence • Example: buying bed first and then bed sheets later • In Web usage mining, it is useful to find navigational patterns of users in a Web site from sequences of page visits of users 62
  • 63. Prof. Pier Luca Lanzi Examples of Sequence • Web sequence §< {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation} {Return to Shopping} > • Sequence of initiating events causing the accident at 3-mile Island: §<{clogged resin} {outlet valve closure} {loss of feedwater} {condenser polisher outlet valve shut} {booster pumps trip} {main waterpump trips} {main turbine trips} {reactor pressure increases}> • Sequence of books checked out at a library: §<{Fellowship of the Ring} {The Two Towers} {Return of the King}> 63
  • 64. Prof. Pier Luca Lanzi Basic Concepts • Let I = {i1, i2, …, im} be a set of items. • A sequence is an ordered list of itemsets. • We denote a sequence s by <a1, a2, …, ar> • An element (or itemset of a sequence is denoted by {x1, x2, …, xk} where xi is a subset of {i1, i2, …, im} • We assume without loss of generality that items in an element of a sequence are in lexicographic order 64
  • 65. Prof. Pier Luca Lanzi Basic concepts • The size of a sequence is the number of elements (or itemsets) in the sequence • The length of a sequence is the number of items in the sequence (a sequence of length k is called k-sequence) • Given two sequence s1 = <a1, a2, …, ar> and s2 = <b1, b2, …, bv> we say that s1 is a subsequnce of s2 §If there exists a set of integers 1 ≤ j1 < j2 < … < jr-1 < jr ≤ v such that for all i, ai ⊂bji • 65
  • 66. Prof. Pier Luca Lanzi Frequent Sequences • Given a database D {s1, …, sN} of N sequences and a sequence r, we define the support count of r as sup(r) = | {si | r is a subsequence of si}| • And the support (or relative support), rsup(r) = sup(r)/N 66
  • 67. Prof. Pier Luca Lanzi Example • Let I = {1, 2, 3, 4, 5, 6, 7, 8, 9}. • The sequence {3}{4, 5}{8} is contained in • (or is a subsequence of) {6} {3, 7}{9}{4, 5, 8}{3, 8} • In fact, {3} {3, 7}, {4, 5} {4, 5, 8}, and {8} {3, 8} • However, {3}{8} is not contained in {3, 8} or vice versa • The size of the sequence {3}{4, 5}{8} is 3, and the length of the sequence is 4. 67
  • 68. Prof. Pier Luca Lanzi Goal of Sequential Pattern Mining • The input is a set S of input data sequences (or sequence database) • The problem of mining sequential patterns is to find all the sequences that have a user-specified minimum support • Each such sequence is called a frequent sequence, or a sequential pattern • The support for a sequence is the fraction of total data sequences in S that contains this sequence 68
  • 69. Prof. Pier Luca Lanzi Terminology • Itemset §Non-empty set of items §Each itemset is mapped to an integer • Sequence §Ordered list of itemsets • Support for a Sequence §Fraction of total customers that support a sequence. • Maximal Sequence §A sequence that is not contained in any other sequence • Large Sequence §Sequence that meets minisup 69
  • 70. Prof. Pier Luca Lanzi Example • Given the following sequence database • With a minsup of 3, the set of frequent subsequences is §A(3), G(3), T(3) §AA(3), AG(3), GA(3), GG(3) §AAG(3), GAA(3), GAG(3) §GAAG(3) 70
  • 71. Prof. Pier Luca Lanzi The GSP Algorithm 71
  • 72. Prof. Pier Luca Lanzi The GSP Algorithm 72
  • 73. Prof. Pier Luca Lanzi The GSP Algorithm 73
  • 75. Prof. Pier Luca Lanzi Example Customer ID Transaction Time Items Bought 1 June 25 '93 30 1 June 30 '93 90 2 June 10 '93 10,20 2 June 15 '93 30 2 June 20 '93 40,60,70 3 June 25 '93 30,50,70 4 June 25 '93 30 4 June 30 '93 40,70 4 July 25 '93 90 5 June 12 '93 90 Customer ID Customer Sequence 1 < (30) (90) > 2 < (10 20) (30) (40 60 70) > 3 < (30) (50) (70) > 4 < (30) (40 70) (90) > 5 < (90) > Maximal seq with support > 40% < (30) (90) > < (30) (40 70) > Note: Use Minisup of 40%, no less than two customers must support the sequence < (10 20) (30) > Does not have enough support (Only by Customer #2) < (30) >, < (70) >, < (30) (40) > … are not maximal. 75
  • 76. Prof. Pier Luca Lanzi Association Rules for Classification 76
  • 77. Prof. Pier Luca Lanzi Mining Association Rules for Classification (the CBA algorithm) • Association rule mining assumes that the data consist of a set of transactions. Thus, the typical tabular representation of data used in classification must be mapped into such a format. • Association rule mining is then applied to the new dataset and the search is focused on association rules in which the tail identifies a class label X → ci (where ci is a class label) • The association rules are pruned using the pessimistic error-based method used in C4.5 • Finally, rules are sorted to build the final classifier. 77
  • 78. Prof. Pier Luca Lanzi JRip Model (humidity = high) and (outlook = sunny) => play=no (3.0/0.0) (outlook = rainy) and (windy = TRUE) => play=no (2.0/0.0) => play=yes (9.0/0.0) 78
  • 79. Prof. Pier Luca Lanzi One Rule Model outlook: overcast -> yes rainy-> yes sunny-> no (10/14 instances correct) 79
  • 80. Prof. Pier Luca Lanzi CBA Model outlook=overcast ==> play=yes humidity=normal windy=FALSE ==> play=yes outlook=rainy windy=FALSE ==> play=yes outlook=sunny humidity=high ==> play=no outlook=rainy windy=TRUE ==> play=no (default class is the majority class) 80
  • 81. Prof. Pier Luca Lanzi Run the KNIME workflow and the Python notebooks for this lecture