This document summarizes a lecture on data mining association rules. It introduces the concept of market baskets and frequent itemsets, and describes the Apriori algorithm for finding frequent itemsets. It discusses improvements like the PCY algorithm and multistage approaches. It also covers high-correlation mining to find rules with high confidence even if rare items have low support, and the use of locality-sensitive hashing to efficiently compare minhash signatures of item columns.
1. CS 361A (Advanced Data Structures and Algorithms) Lecture 20 (Dec 7, 2005) Data Mining: Association Rules Rajeev Motwani (partially based on notes by Jeff Ullman)
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17. Memory Usage – A-Priori Candidate Items Pass 1 Pass 2 Frequent Items Candidate Pairs M E M O R Y M E M O R Y
18.
19. Memory Usage – PCY Candidate Items Pass 1 Pass 2 M E M O R Y M E M O R Y Hash Table Frequent Items Bitmap Candidate Pairs