4. Literature Data Mining – Concepts and Techniques by J. Han & M. Kamber, Morgan Kaufmann Publishers, 2001 Pattern Classification by R. Duda, P. Hart and D. Stork, 2 nd edition, John Wiley & Sons, 2001
35. Data Mining at Work Data Sources Project Objectives Single Multiple Numerous Diagnostics Target Marketing Effluent Quality Control Decision Support Automation Transaction Management Cost Prediction (Warranty, Insurance Claims) Warranty Clustering Territorial Ratemaking Web Information Retrieval, Archival and Clustering Auto Loss Ratio Predictions Precision Farming Bio-Informatics Functional Foods Heterogeneous Data Visualization Crime Data Analysis Data Fusion and Visualization Survey Study of Disability
40. Market Basket Example Is soda typically purchased with bananas? Does the brand of soda make a difference? Where should detergents be placed in the Store to maximize their sales? Are window cleaning products purchased when detergents and orange juice are bought together? How are the demographics of the neighborhood affecting what customers are buying? ? ? ? ?
41.
42.
43. How Does It Work? Orange juice, Soda Milk, Orange Juice, Window Cleaner Orange Juice, Detergent Orange juice, detergent, soda Window cleaner, soda OJ 4 1 1 2 1 OJ Window Cleaner Milk Soda Detergent 1 2 1 1 0 1 1 1 0 0 2 1 0 3 1 1 0 0 1 2 Window Cleaner Milk Soda Detergent Co-Occurrence of Products Customer Items 1 2 3 4 5 Grocery Point-of-Sale Transactions Orange Juice, Soda Milk, Orange Juice, Window Cleaner Orange Juice, Detergent Orange Juice, Detergent, Soda Window Cleaner, Soda
44.
45.
46.
47. Confidence and Support Transaction ID # Items 1 2 3 4 { 1, 2, 3 } { 1,3 } { 1,4 } { 2, 5, 6 } Frequent One Item Set Support { 1 } { 2 } { 3 } { 4 } 75 % 50 % 50 % 25 % For minimum support = 50% = 2 transactions and minimum confidence = 50% For the rule 1=> 3: Support = Support({1,3}) = 50% Confidence (1->3) = Support ({1,3})/Support({1}) = 66% Confidence (3->1)= Support ({1,3})/Support({3}) = 100% Frequent Two Item Set Support { 1,2 } { 1,3 } { 1,4 } { 2,3 } 25 % 50 % 25 % 25 %
48.
49.
50. Choosing the Right Set of Items Frozen Foods Frozen Desserts Frozen Vegetables Frozen Dinners Frozen Yogurt Frozen Fruit Bars Ice Cream Peas Carrots Mixed Other Rocky Road Chocolate Strawberry Vanilla Cherry Garcia Other Partial Product Taxonomy General Specific
51. Example - Minimum Support Pruning / Rule Generation Transaction ID # Items 1 2 3 4 { 1, 3, 4 } { 2, 3, 5 } { 1, 2, 3, 5 } { 2, 5 } Itemset Support { 1 } { 2 } { 3 } { 4 } { 5 } 2 3 3 1 3 Itemset Support { 2 } { 3 } { 5 } 3 3 3 Itemset { 2 } { 3 } { 5 } Itemset Support { 2, 3 } { 2, 5 } { 3, 5 } 2 3 2 Itemset Support { 2, 5 } 3 Scan Database Find Pairings Find Level of Support Scan Database Find Pairings Find Level of Support Two rules with the highest support for two item set: 2->5 and 5->2
59. Decision Tree for Concept: PlayTennis Outlook? Humidity? Wind? Sunny Overcast Rain Yes No High Normal No Strong Light Outlook? Humidity? Wind? Sunny Overcast Rain Yes No High Normal No Strong Light Yes Yes Yes Yes
60. Decision Trees and Decision Boundaries + + - - + + + + - - y x 1 3 5 7 How to Visualize Decision Trees? Example: Dividing Instance Space into Axis-Parallel Rectangles More than two variables ? y > 7? No Yes x < 3? No Yes y < 5? No Yes x < 1? No Yes
61. An Illustrative Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Day Sunny Sunny Overcast Rain Rain Rain Overcast Sunny Sunny Rain Sunny Overcast Overcast Rain Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot Mild Temperature Humidity Wind PlayTennis? High High High High Normal Normal Normal High Normal Normal Normal High Normal High Outlook Light Strong Light Light Light Strong Strong Light Light Light Strong Strong Light Strong No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No Training Examples for Concept PlayTennis