SlideShare une entreprise Scribd logo
1  sur  33
Data Mining:
                 Concepts and Techniques

                        ©SB Institute




April 17, 2012           Data Mining: Concepts and Techniques   1
Chapter 1. Introduction

          Motivation: Why data mining?
          What is data mining?
          Data Mining: On what kind of data?
          Data mining functionality
          Are all the patterns interesting?
          Classification of data mining systems
          Major issues in data mining

April 17, 2012                Data Mining: Concepts and Techniques   2
Motivation: “Necessity is the
                 Mother of Invention”

         Data explosion problem
                Automated data collection tools and mature database
                 technology lead to tremendous amounts of data stored in
                 databases, data warehouses and other information repositories


         We are drowning in data, but starving for knowledge!
         Solution: Data warehousing and data mining
                Data warehousing and on-line analytical processing

                Extraction of interesting knowledge (rules, regularities,

April 17, 2012   patterns, constraints) Mining: Concepts in large databases
                                    Data from data and Techniques                3
Evolution of Database Technology

         1960s:
                Data collection, database creation, IMS and network DBMS
         1970s:
                Relational data model, relational DBMS implementation
         1980s:
                RDBMS, advanced data models (extended-relational, OO,
                 deductive, etc.) and application-oriented DBMS (spatial,
                 scientific, engineering, etc.)
         1990s—2000s:
                Data mining and data warehousing, multimedia databases, and
                 Web databases
April 17, 2012                     Data Mining: Concepts and Techniques        4
What Is Data Mining?

            Data mining (knowledge discovery in databases):

                  Extraction of interesting (non-trivial, implicit, previously
                   unknown and potentially useful) information or patterns
                   from data in large databases
            Alternative names and their “inside stories”:
                  Data mining: a misnomer?
                  Knowledge discovery(mining) in databases (KDD),
                   knowledge extraction, data/pattern analysis, data
                   archeology, data dredging, information harvesting,
                   business intelligence, etc.
            What is not data mining?
                  (Deductive) query processing.
April 17, 2012     Expert systems or small Concepts and Techniques
                                   Data Mining: ML/statistical programs           5
Why Data Mining? — Potential
                 Applications

        Database analysis and decision support
                Market analysis and management
                     target marketing, customer relation management, market
                      basket analysis, cross selling, market segmentation
                Risk analysis and management
                     Forecasting, customer retention, improved underwriting,
                      quality control, competitive analysis
                Fraud detection and management
        Other Applications
                Text mining (news group, email, documents) and Web analysis.
                Intelligent query answering

April 17, 2012                        Data Mining: Concepts and Techniques      6
Market Analysis and Management (1)

         Where are the data sources for analysis?
                Credit card transactions, loyalty cards, discount coupons,
                 customer complaint calls, plus (public) lifestyle studies
         Target marketing
                Find clusters of “model” customers who share the same
                 characteristics: interest, income level, spending habits, etc.
         Determine customer purchasing patterns over time
                Conversion of single to a joint bank account: marriage, etc.
         Cross-market analysis
                Associations/co-relations between product sales
                Prediction based on the association information
April 17, 2012                      Data Mining: Concepts and Techniques          7
Market Analysis and Management (2)

        Customer profiling
                data mining can tell you what types of customers buy what
                 products (clustering or classification)
        Identifying customer requirements
                identifying the best products for different customers
                use prediction to find what factors will attract new customers
        Provides summary information
                various multidimensional summary reports
                statistical summary information (data central tendency and
                 variation)
April 17, 2012                       Data Mining: Concepts and Techniques         8
Corporate Analysis and Risk
                    Management

        Finance planning and asset evaluation
                cash flow analysis and prediction
                contingent claim analysis to evaluate assets
                cross-sectional and time series analysis (financial-ratio, trend
                 analysis, etc.)
        Resource planning:
                summarize and compare the resources and spending
        Competition:
                monitor competitors and market directions
                group customers into classes and a class-based pricing
                 procedure
                set pricing strategy in a highly competitive market

April 17, 2012                       Data Mining: Concepts and Techniques           9
Fraud Detection and Management (1)

         Applications
                widely used in health care, retail, credit card services,
                 telecommunications (phone card fraud), etc.
         Approach
                use historical data to build models of fraudulent behavior and
                 use data mining to help identify similar instances
         Examples
                auto insurance: detect a group of people who stage accidents to
                 collect on insurance
                money laundering: detect suspicious money transactions (US
                 Treasury's Financial Crimes Enforcement Network)
                medical insurance: detect professional patients and ring of
                 doctors and ring of references
April 17, 2012                       Data Mining: Concepts and Techniques          10
Fraud Detection and Management (2)

         Detecting inappropriate medical treatment
                Australian Health Insurance Commission identifies that in many
                 cases blanket screening tests were requested (save Australian
                 $1m/yr).
         Detecting telephone fraud
                Telephone call model: destination of the call, duration, time of
                 day or week. Analyze patterns that deviate from an expected
                 norm.
                British Telecom identified discrete groups of callers with frequent
                 intra-group calls, especially mobile phones, and broke a
                 multimillion dollar fraud.
         Retail
                Analysts estimate that 38% of retail shrink is due to dishonest
                 employees.
April 17, 2012                      Data Mining: Concepts and Techniques               11
Other Applications

          Sports
                IBM Advanced Scout analyzed NBA game statistics (shots
                 blocked, assists, and fouls) to gain competitive advantage for
                 New York Knicks and Miami Heat
          Astronomy
                JPL and the Palomar Observatory discovered 22 quasars with
                 the help of data mining
          Internet Web Surf-Aid
                IBM Surf-Aid applies data mining algorithms to Web access
                 logs for market-related pages to discover customer preference
                 and behavior pages, analyzing effectiveness of Web marketing,
                 improving Web site organization, etc.
April 17, 2012                     Data Mining: Concepts and Techniques           12
Data Mining: A KDD Process

                                                             Pattern Evaluation
        Data mining: the core of
         knowledge discovery
         process.                 Data Mining

                             Task-relevant Data


            Data                       Selection
            Warehouse
 Data Cleaning

                  Data Integration


                 Databases
April 17, 2012                       Data Mining: Concepts and Techniques         13
Steps of a KDD Process

         Learning the application domain:
                relevant prior knowledge and goals of application
         Creating a target data set: data selection
         Data cleaning and preprocessing: (may take 60% of effort!)
         Data reduction and transformation:
                Find useful features, dimensionality/variable reduction, invariant
                 representation.
         Choosing functions of data mining
                summarization, classification, regression, association, clustering.
         Choosing the mining algorithm(s)
         Data mining: search for patterns of interest
         Pattern evaluation and knowledge presentation
                visualization, transformation, removing redundant patterns, etc.
         Use of discovered knowledge
April 17, 2012                        Data Mining: Concepts and Techniques             14
Data Mining and Business Intelligence
     Increasing potential
     to support
     business decisions                                                        End User
                                             Making
                                             Decisions

                                         Data Presentation                     Business
                                                                                Analyst
                                     Visualization Techniques
                                           Data Mining                           Data
                                        Information Discovery                  Analyst

                                          Data Exploration
                            Statistical Analysis, Querying and Reporting

                                 Data Warehouses / Data Marts
                                         OLAP, MDA                                DBA
                                        Data Sources
                 Paper, Files, Information Providers, Database Systems, OLTP
April 17, 2012                         Data Mining: Concepts and Techniques               15
Architecture of a Typical Data
                     Mining System
                             Graphical user interface


                               Pattern evaluation

                             Data mining engine
                                                                             Knowledge-base
                                  Database or data
                                  warehouse server
                 Data cleaning & data integration                Filtering

                                                          Data
                             Databases                  Warehouse

April 17, 2012                       Data Mining: Concepts and Techniques                     16
Data Mining: On What Kind of
                 Data?

            Relational databases
            Data warehouses
            Transactional databases
            Advanced DB and information repositories
                    Object-oriented and object-relational databases
                    Spatial databases
                    Time-series data and temporal data
                    Text databases and multimedia databases
                    Heterogeneous and legacy databases
                    WWW
April 17, 2012                     Data Mining: Concepts and Techniques   17
Data Mining Functionalities (1)

         Concept description: Characterization and
          discrimination
                Generalize, summarize, and contrast data
                 characteristics, e.g., dry vs. wet regions
         Association (correlation and causality)
                Multi-dimensional vs. single-dimensional association
                age(X, “20..29”) ^ income(X, “20..29K”)  buys(X,
                 “PC”) [support = 2%, confidence = 60%]
                contains(T, “computer”)  contains(x, “software”)
                 [1%, 75%]
April 17, 2012                  Data Mining: Concepts and Techniques    18
Data Mining Functionalities (2)

         Classification and Prediction
                Finding models (functions) that describe and distinguish classes
                 or concepts for future prediction
                E.g., classify countries based on climate, or classify cars based
                 on gas mileage
                Presentation: decision-tree, classification rule, neural network
                Prediction: Predict some unknown or missing numerical values
         Cluster analysis
                Class label is unknown: Group data to form new classes, e.g.,
                 cluster houses to find distribution patterns
                Clustering based on the principle: maximizing the intra-class
                 similarity and minimizing the interclass similarity
April 17, 2012                      Data Mining: Concepts and Techniques             19
Data Mining Functionalities (3)

          Outlier analysis
                Outlier: a data object that does not comply with the general behavior
                 of the data
                It can be considered as noise or exception but is quite useful in fraud
                 detection, rare events analysis

          Trend and evolution analysis
                Trend and deviation: regression analysis
                Sequential pattern mining, periodicity analysis
                Similarity-based analysis
          Other pattern-directed or statistical analyses
April 17, 2012                       Data Mining: Concepts and Techniques                  20
Are All the “Discovered” Patterns
                 Interesting?
        A data mining system/query may generate thousands of patterns,
         not all of them are interesting.
                Suggested approach: Human-centered, query-based, focused mining
        Interestingness measures: A pattern is interesting if it is easily
         understood by humans, valid on new or test data with some degree
         of certainty, potentially useful, novel, or validates some hypothesis
         that a user seeks to confirm
        Objective vs. subjective interestingness measures:
                Objective: based on statistics and structures of patterns, e.g., support,
                 confidence, etc.
                Subjective: based on user’s belief in the data, e.g., unexpectedness,
                 novelty, actionability, etc.
April 17, 2012                          Data Mining: Concepts and Techniques                 21
Can We Find All and Only
                  Interesting Patterns?

         Find all the interesting patterns: Completeness
                Can a data mining system find all the interesting patterns?
                Association vs. classification vs. clustering
         Search for only interesting patterns: Optimization
                Can a data mining system find only the interesting patterns?
                Approaches
                     First general all the patterns and then filter out the
                      uninteresting ones.
                     Generate only the interesting patterns—mining query
                      optimization
April 17, 2012                        Data Mining: Concepts and Techniques      22
Data Mining: Confluence of Multiple
                 Disciplines
                    Database
                                                          Statistics
                   Technology



      Machine
      Learning
                                Data Mining                            Visualization



                 Information                                      Other
                   Science                                      Disciplines

April 17, 2012                  Data Mining: Concepts and Techniques                   23
Data Mining: Classification Schemes

                General functionality
                    Descriptive data mining
                    Predictive data mining
                Different views, different classifications
                    Kinds of databases to be mined
                    Kinds of knowledge to be discovered
                    Kinds of techniques utilized
                    Kinds of applications adapted
April 17, 2012                    Data Mining: Concepts and Techniques   24
A Multi-Dimensional View of Data
             Mining Classification
         Databases to be mined
            Relational, transactional, object-oriented, object-relational,

             active, spatial, time-series, text, multi-media, heterogeneous,
             legacy, WWW, etc.
         Knowledge to be mined
            Characterization, discrimination, association, classification,

             clustering, trend, deviation and outlier analysis, etc.
            Multiple/integrated functions and mining at multiple levels

         Techniques utilized
            Database-oriented, data warehouse (OLAP), machine learning,

             statistics, visualization, neural network, etc.
         Applications adapted
                Retail, telecommunication, banking, fraud analysis, DNA mining, stock
                 market analysis, Web mining, Weblog analysis, etc.
April 17, 2012                       Data Mining: Concepts and Techniques                25
OLAP Mining: An Integration of Data
                 Mining and Data Warehousing

        Data mining systems, DBMS, Data warehouse
         systems coupling
                No coupling, loose-coupling, semi-tight-coupling, tight-coupling
        On-line analytical mining data
                integration of mining and OLAP technologies
        Interactive mining multi-level knowledge
                Necessity of mining knowledge and patterns at different levels of
                 abstraction by drilling/rolling, pivoting, slicing/dicing, etc.
        Integration of multiple mining functions
                Characterized classification, first clustering and then association
April 17, 2012                      Data Mining: Concepts and Techniques               26
An OLAM Architecture
  Mining query                                             Mining result      Layer4
                                                                           User Interface
                             User GUI API
                                                                              Layer3
                 OLAM                                         OLAP
                 Engine                                       Engine       OLAP/OLAM

                              Data Cube API


                                                                              Layer2
                                MDDB
                                                                              MDDB
                                                             Meta
                                                             Data
    Filtering&Integration     Database API                   Filtering
                                                                              Layer1
                               Data cleaning     Data
                 Databases                                                    Data
                              Data integration Warehouse                    Repository
April 17, 2012                   Data Mining: Concepts and Techniques                    27
Major Issues in Data Mining (1)

       Mining methodology and user interaction
                Mining different kinds of knowledge in databases
                Interactive mining of knowledge at multiple levels of abstraction
                Incorporation of background knowledge
                Data mining query languages and ad-hoc data mining
                Expression and visualization of data mining results
                Handling noise and incomplete data
                Pattern evaluation: the interestingness problem
       Performance and scalability
                Efficiency and scalability of data mining algorithms
                Parallel, distributed and incremental mining methods
April 17, 2012                       Data Mining: Concepts and Techniques            28
Major Issues in Data Mining (2)

      Issues relating to the diversity of data types
            Handling relational and complex types of data
            Mining information from heterogeneous databases and global
             information systems (WWW)
      Issues related to applications and social impacts
            Application of discovered knowledge
                Domain-specific data mining tools

                Intelligent query answering

                Process control and decision making

            Integration of the discovered knowledge with existing knowledge:
             A knowledge fusion problem
            Protection of data security, integrity, and privacy
April 17, 2012                   Data Mining: Concepts and Techniques           29
Summary
        Data mining: discovering interesting patterns from large amounts of
         data
        A natural evolution of database technology, in great demand, with
         wide applications
        A KDD process includes data cleaning, data integration, data
         selection, transformation, data mining, pattern evaluation, and
         knowledge presentation
        Mining can be performed in a variety of information repositories
        Data mining functionalities: characterization, discrimination,
         association, classification, clustering, outlier and trend analysis, etc.
        Classification of data mining systems
        Major issues in data mining
April 17, 2012                   Data Mining: Concepts and Techniques                30
Where to Find References?

      Data mining and KDD (SIGKDD member CDROM):
            Conference proceedings: KDD, and others, such as PKDD, PAKDD, etc.
            Journal: Data Mining and Knowledge Discovery
      Database field (SIGMOD member CD ROM):
            Conference proceedings: ACM-SIGMOD, ACM-PODS, VLDB, ICDE, EDBT,
             DASFAA
            Journals: ACM-TODS, J. ACM, IEEE-TKDE, JIIS, etc.
      AI and Machine Learning:
            Conference proceedings: Machine learning, AAAI, IJCAI, etc.
            Journals: Machine Learning, Artificial Intelligence, etc.
      Statistics:
            Conference proceedings: Joint Stat. Meeting, etc.
            Journals: Annals of statistics, etc.
      Visualization:
            Conference proceedings: CHI, etc.
            Journals: IEEE Trans. visualization and computer graphics, etc.
April 17, 2012                      Data Mining: Concepts and Techniques          31
References

          U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in
           Knowledge Discovery and Data Mining. AAAI/MIT Press, 1996.
          J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann,
           2000.
          T. Imielinski and H. Mannila. A database perspective on knowledge discovery.
           Communications of ACM, 39:58-64, 1996.
          G. Piatetsky-Shapiro, U. Fayyad, and P. Smith. From data mining to knowledge
           discovery: An overview. In U.M. Fayyad, et al. (eds.), Advances in Knowledge
           Discovery and Data Mining, 1-35. AAAI/MIT Press, 1996.
          G. Piatetsky-Shapiro and W. J. Frawley. Knowledge Discovery in Databases. AAAI/
           MIT Press, 1991.




April 17, 2012                       Data Mining: Concepts and Techniques                    32
SB Institute
                    Centre for Knowledge Transfer




                 Thank you !!!
April 17, 2012        Data Mining: Concepts and Techniques   33

Contenu connexe

Tendances

What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)Pratik Tambekar
 
Artificial Intelligence: Data Mining
Artificial Intelligence: Data MiningArtificial Intelligence: Data Mining
Artificial Intelligence: Data MiningThe Integral Worm
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryHoang Nguyen
 
Data Mining and Knowledge Discovery in Large Databases
Data Mining and Knowledge Discovery in Large DatabasesData Mining and Knowledge Discovery in Large Databases
Data Mining and Knowledge Discovery in Large DatabasesSSA KPI
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseKartik Kalpande Patil
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentationmillerca2
 
Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Mark Tabladillo
 
Data mining & Decison Trees
Data mining & Decison TreesData mining & Decison Trees
Data mining & Decison TreesSelman Bozkır
 
Datamining and Business Analytics
Datamining and Business Analytics Datamining and Business Analytics
Datamining and Business Analytics amacolumbia
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Knowledge Discovery in Databases
Knowledge Discovery in DatabasesKnowledge Discovery in Databases
Knowledge Discovery in DatabasesDiwas Kandel
 

Tendances (20)

What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)
 
Data Mining Overview
Data Mining OverviewData Mining Overview
Data Mining Overview
 
Artificial Intelligence: Data Mining
Artificial Intelligence: Data MiningArtificial Intelligence: Data Mining
Artificial Intelligence: Data Mining
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Introduction to DataMining
Introduction to DataMiningIntroduction to DataMining
Introduction to DataMining
 
Data Mining and Knowledge Discovery in Large Databases
Data Mining and Knowledge Discovery in Large DatabasesData Mining and Knowledge Discovery in Large Databases
Data Mining and Knowledge Discovery in Large Databases
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Data mining
Data miningData mining
Data mining
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentation
 
Ch 1 intro_dw
Ch 1 intro_dwCh 1 intro_dw
Ch 1 intro_dw
 
Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008
 
Data mining & Decison Trees
Data mining & Decison TreesData mining & Decison Trees
Data mining & Decison Trees
 
Data mining
Data miningData mining
Data mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Data mining
Data mining Data mining
Data mining
 
Datamining and Business Analytics
Datamining and Business Analytics Datamining and Business Analytics
Datamining and Business Analytics
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Knowledge Discovery in Databases
Knowledge Discovery in DatabasesKnowledge Discovery in Databases
Knowledge Discovery in Databases
 

En vedette

amanda bynes
amanda bynesamanda bynes
amanda bynesrenegarza
 
Espécies literárias de pequena extensão
Espécies literárias de pequena extensãoEspécies literárias de pequena extensão
Espécies literárias de pequena extensãoma.no.el.ne.ves
 
Verbos modos subjuntivo e imperativo
Verbos modos subjuntivo e imperativoVerbos modos subjuntivo e imperativo
Verbos modos subjuntivo e imperativoAires Jones
 
Gaya hidup sihat cikgu nasir
Gaya hidup sihat  cikgu nasirGaya hidup sihat  cikgu nasir
Gaya hidup sihat cikgu nasirikhwan hadi
 
Symposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT KanpurSymposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT KanpurRishi Pathak
 
Tecnicas del juicio oral en el sistema penal acusatorio colombiano
Tecnicas del juicio oral en el sistema penal acusatorio colombianoTecnicas del juicio oral en el sistema penal acusatorio colombiano
Tecnicas del juicio oral en el sistema penal acusatorio colombianoRuben Rada Escobar
 

En vedette (11)

Feelings!
Feelings!Feelings!
Feelings!
 
Derailing Damages
Derailing DamagesDerailing Damages
Derailing Damages
 
amanda bynes
amanda bynesamanda bynes
amanda bynes
 
Silabus pembelajaran
Silabus pembelajaranSilabus pembelajaran
Silabus pembelajaran
 
Feelings!
Feelings!Feelings!
Feelings!
 
Espécies literárias de pequena extensão
Espécies literárias de pequena extensãoEspécies literárias de pequena extensão
Espécies literárias de pequena extensão
 
Verbos modos subjuntivo e imperativo
Verbos modos subjuntivo e imperativoVerbos modos subjuntivo e imperativo
Verbos modos subjuntivo e imperativo
 
Gaya hidup sihat cikgu nasir
Gaya hidup sihat  cikgu nasirGaya hidup sihat  cikgu nasir
Gaya hidup sihat cikgu nasir
 
Calculo integral
Calculo integralCalculo integral
Calculo integral
 
Symposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT KanpurSymposium on HPC Applications – IIT Kanpur
Symposium on HPC Applications – IIT Kanpur
 
Tecnicas del juicio oral en el sistema penal acusatorio colombiano
Tecnicas del juicio oral en el sistema penal acusatorio colombianoTecnicas del juicio oral en el sistema penal acusatorio colombiano
Tecnicas del juicio oral en el sistema penal acusatorio colombiano
 

Similaire à D

Introduction.ppt
Introduction.pptIntroduction.ppt
Introduction.pptbommaiah
 
Datamininglecture
DatamininglectureDatamininglecture
DatamininglectureManish Rana
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1DanWooster1
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.pptadmsoyadm4
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Miningdataminers.ir
 
Trends in DM.pptx
Trends in DM.pptxTrends in DM.pptx
Trends in DM.pptxImXaib
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptPadmajaLaksh
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryYoung Alista
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryHarry Potter
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryJames Wong
 

Similaire à D (20)

Data mining 1
Data mining 1Data mining 1
Data mining 1
 
Introduction.ppt
Introduction.pptIntroduction.ppt
Introduction.ppt
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Datamininglecture
DatamininglectureDatamininglecture
Datamininglecture
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
data mining
data miningdata mining
data mining
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 
Trends in DM.pptx
Trends in DM.pptxTrends in DM.pptx
Trends in DM.pptx
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
 
10appl
10appl10appl
10appl
 
Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 

Plus de Dr. C.V. Suresh Babu (20)

Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
 
Association rules
Association rulesAssociation rules
Association rules
 
Clustering
ClusteringClustering
Clustering
 
Classification
ClassificationClassification
Classification
 
Blue property assumptions.
Blue property assumptions.Blue property assumptions.
Blue property assumptions.
 
Introduction to regression
Introduction to regressionIntroduction to regression
Introduction to regression
 
DART
DARTDART
DART
 
Mycin
MycinMycin
Mycin
 
Expert systems
Expert systemsExpert systems
Expert systems
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
 
Bayes network
Bayes networkBayes network
Bayes network
 
Bayes' theorem
Bayes' theoremBayes' theorem
Bayes' theorem
 
Knowledge based agents
Knowledge based agentsKnowledge based agents
Knowledge based agents
 
Rule based system
Rule based systemRule based system
Rule based system
 
Formal Logic in AI
Formal Logic in AIFormal Logic in AI
Formal Logic in AI
 
Production based system
Production based systemProduction based system
Production based system
 
Game playing in AI
Game playing in AIGame playing in AI
Game playing in AI
 
Diagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AIDiagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AI
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 

Dernier

Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEaurabinda banchhor
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxElton John Embodo
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 

Dernier (20)

Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSE
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 

D

  • 1. Data Mining: Concepts and Techniques ©SB Institute April 17, 2012 Data Mining: Concepts and Techniques 1
  • 2. Chapter 1. Introduction  Motivation: Why data mining?  What is data mining?  Data Mining: On what kind of data?  Data mining functionality  Are all the patterns interesting?  Classification of data mining systems  Major issues in data mining April 17, 2012 Data Mining: Concepts and Techniques 2
  • 3. Motivation: “Necessity is the Mother of Invention”  Data explosion problem  Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories  We are drowning in data, but starving for knowledge!  Solution: Data warehousing and data mining  Data warehousing and on-line analytical processing  Extraction of interesting knowledge (rules, regularities, April 17, 2012 patterns, constraints) Mining: Concepts in large databases Data from data and Techniques 3
  • 4. Evolution of Database Technology  1960s:  Data collection, database creation, IMS and network DBMS  1970s:  Relational data model, relational DBMS implementation  1980s:  RDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.)  1990s—2000s:  Data mining and data warehousing, multimedia databases, and Web databases April 17, 2012 Data Mining: Concepts and Techniques 4
  • 5. What Is Data Mining?  Data mining (knowledge discovery in databases):  Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases  Alternative names and their “inside stories”:  Data mining: a misnomer?  Knowledge discovery(mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.  What is not data mining?  (Deductive) query processing. April 17, 2012  Expert systems or small Concepts and Techniques Data Mining: ML/statistical programs 5
  • 6. Why Data Mining? — Potential Applications  Database analysis and decision support  Market analysis and management  target marketing, customer relation management, market basket analysis, cross selling, market segmentation  Risk analysis and management  Forecasting, customer retention, improved underwriting, quality control, competitive analysis  Fraud detection and management  Other Applications  Text mining (news group, email, documents) and Web analysis.  Intelligent query answering April 17, 2012 Data Mining: Concepts and Techniques 6
  • 7. Market Analysis and Management (1)  Where are the data sources for analysis?  Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public) lifestyle studies  Target marketing  Find clusters of “model” customers who share the same characteristics: interest, income level, spending habits, etc.  Determine customer purchasing patterns over time  Conversion of single to a joint bank account: marriage, etc.  Cross-market analysis  Associations/co-relations between product sales  Prediction based on the association information April 17, 2012 Data Mining: Concepts and Techniques 7
  • 8. Market Analysis and Management (2)  Customer profiling  data mining can tell you what types of customers buy what products (clustering or classification)  Identifying customer requirements  identifying the best products for different customers  use prediction to find what factors will attract new customers  Provides summary information  various multidimensional summary reports  statistical summary information (data central tendency and variation) April 17, 2012 Data Mining: Concepts and Techniques 8
  • 9. Corporate Analysis and Risk Management  Finance planning and asset evaluation  cash flow analysis and prediction  contingent claim analysis to evaluate assets  cross-sectional and time series analysis (financial-ratio, trend analysis, etc.)  Resource planning:  summarize and compare the resources and spending  Competition:  monitor competitors and market directions  group customers into classes and a class-based pricing procedure  set pricing strategy in a highly competitive market April 17, 2012 Data Mining: Concepts and Techniques 9
  • 10. Fraud Detection and Management (1)  Applications  widely used in health care, retail, credit card services, telecommunications (phone card fraud), etc.  Approach  use historical data to build models of fraudulent behavior and use data mining to help identify similar instances  Examples  auto insurance: detect a group of people who stage accidents to collect on insurance  money laundering: detect suspicious money transactions (US Treasury's Financial Crimes Enforcement Network)  medical insurance: detect professional patients and ring of doctors and ring of references April 17, 2012 Data Mining: Concepts and Techniques 10
  • 11. Fraud Detection and Management (2)  Detecting inappropriate medical treatment  Australian Health Insurance Commission identifies that in many cases blanket screening tests were requested (save Australian $1m/yr).  Detecting telephone fraud  Telephone call model: destination of the call, duration, time of day or week. Analyze patterns that deviate from an expected norm.  British Telecom identified discrete groups of callers with frequent intra-group calls, especially mobile phones, and broke a multimillion dollar fraud.  Retail  Analysts estimate that 38% of retail shrink is due to dishonest employees. April 17, 2012 Data Mining: Concepts and Techniques 11
  • 12. Other Applications  Sports  IBM Advanced Scout analyzed NBA game statistics (shots blocked, assists, and fouls) to gain competitive advantage for New York Knicks and Miami Heat  Astronomy  JPL and the Palomar Observatory discovered 22 quasars with the help of data mining  Internet Web Surf-Aid  IBM Surf-Aid applies data mining algorithms to Web access logs for market-related pages to discover customer preference and behavior pages, analyzing effectiveness of Web marketing, improving Web site organization, etc. April 17, 2012 Data Mining: Concepts and Techniques 12
  • 13. Data Mining: A KDD Process Pattern Evaluation  Data mining: the core of knowledge discovery process. Data Mining Task-relevant Data Data Selection Warehouse Data Cleaning Data Integration Databases April 17, 2012 Data Mining: Concepts and Techniques 13
  • 14. Steps of a KDD Process  Learning the application domain:  relevant prior knowledge and goals of application  Creating a target data set: data selection  Data cleaning and preprocessing: (may take 60% of effort!)  Data reduction and transformation:  Find useful features, dimensionality/variable reduction, invariant representation.  Choosing functions of data mining  summarization, classification, regression, association, clustering.  Choosing the mining algorithm(s)  Data mining: search for patterns of interest  Pattern evaluation and knowledge presentation  visualization, transformation, removing redundant patterns, etc.  Use of discovered knowledge April 17, 2012 Data Mining: Concepts and Techniques 14
  • 15. Data Mining and Business Intelligence Increasing potential to support business decisions End User Making Decisions Data Presentation Business Analyst Visualization Techniques Data Mining Data Information Discovery Analyst Data Exploration Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts OLAP, MDA DBA Data Sources Paper, Files, Information Providers, Database Systems, OLTP April 17, 2012 Data Mining: Concepts and Techniques 15
  • 16. Architecture of a Typical Data Mining System Graphical user interface Pattern evaluation Data mining engine Knowledge-base Database or data warehouse server Data cleaning & data integration Filtering Data Databases Warehouse April 17, 2012 Data Mining: Concepts and Techniques 16
  • 17. Data Mining: On What Kind of Data?  Relational databases  Data warehouses  Transactional databases  Advanced DB and information repositories  Object-oriented and object-relational databases  Spatial databases  Time-series data and temporal data  Text databases and multimedia databases  Heterogeneous and legacy databases  WWW April 17, 2012 Data Mining: Concepts and Techniques 17
  • 18. Data Mining Functionalities (1)  Concept description: Characterization and discrimination  Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions  Association (correlation and causality)  Multi-dimensional vs. single-dimensional association  age(X, “20..29”) ^ income(X, “20..29K”)  buys(X, “PC”) [support = 2%, confidence = 60%]  contains(T, “computer”)  contains(x, “software”) [1%, 75%] April 17, 2012 Data Mining: Concepts and Techniques 18
  • 19. Data Mining Functionalities (2)  Classification and Prediction  Finding models (functions) that describe and distinguish classes or concepts for future prediction  E.g., classify countries based on climate, or classify cars based on gas mileage  Presentation: decision-tree, classification rule, neural network  Prediction: Predict some unknown or missing numerical values  Cluster analysis  Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns  Clustering based on the principle: maximizing the intra-class similarity and minimizing the interclass similarity April 17, 2012 Data Mining: Concepts and Techniques 19
  • 20. Data Mining Functionalities (3)  Outlier analysis  Outlier: a data object that does not comply with the general behavior of the data  It can be considered as noise or exception but is quite useful in fraud detection, rare events analysis  Trend and evolution analysis  Trend and deviation: regression analysis  Sequential pattern mining, periodicity analysis  Similarity-based analysis  Other pattern-directed or statistical analyses April 17, 2012 Data Mining: Concepts and Techniques 20
  • 21. Are All the “Discovered” Patterns Interesting?  A data mining system/query may generate thousands of patterns, not all of them are interesting.  Suggested approach: Human-centered, query-based, focused mining  Interestingness measures: A pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to confirm  Objective vs. subjective interestingness measures:  Objective: based on statistics and structures of patterns, e.g., support, confidence, etc.  Subjective: based on user’s belief in the data, e.g., unexpectedness, novelty, actionability, etc. April 17, 2012 Data Mining: Concepts and Techniques 21
  • 22. Can We Find All and Only Interesting Patterns?  Find all the interesting patterns: Completeness  Can a data mining system find all the interesting patterns?  Association vs. classification vs. clustering  Search for only interesting patterns: Optimization  Can a data mining system find only the interesting patterns?  Approaches  First general all the patterns and then filter out the uninteresting ones.  Generate only the interesting patterns—mining query optimization April 17, 2012 Data Mining: Concepts and Techniques 22
  • 23. Data Mining: Confluence of Multiple Disciplines Database Statistics Technology Machine Learning Data Mining Visualization Information Other Science Disciplines April 17, 2012 Data Mining: Concepts and Techniques 23
  • 24. Data Mining: Classification Schemes  General functionality  Descriptive data mining  Predictive data mining  Different views, different classifications  Kinds of databases to be mined  Kinds of knowledge to be discovered  Kinds of techniques utilized  Kinds of applications adapted April 17, 2012 Data Mining: Concepts and Techniques 24
  • 25. A Multi-Dimensional View of Data Mining Classification  Databases to be mined  Relational, transactional, object-oriented, object-relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW, etc.  Knowledge to be mined  Characterization, discrimination, association, classification, clustering, trend, deviation and outlier analysis, etc.  Multiple/integrated functions and mining at multiple levels  Techniques utilized  Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, neural network, etc.  Applications adapted  Retail, telecommunication, banking, fraud analysis, DNA mining, stock market analysis, Web mining, Weblog analysis, etc. April 17, 2012 Data Mining: Concepts and Techniques 25
  • 26. OLAP Mining: An Integration of Data Mining and Data Warehousing  Data mining systems, DBMS, Data warehouse systems coupling  No coupling, loose-coupling, semi-tight-coupling, tight-coupling  On-line analytical mining data  integration of mining and OLAP technologies  Interactive mining multi-level knowledge  Necessity of mining knowledge and patterns at different levels of abstraction by drilling/rolling, pivoting, slicing/dicing, etc.  Integration of multiple mining functions  Characterized classification, first clustering and then association April 17, 2012 Data Mining: Concepts and Techniques 26
  • 27. An OLAM Architecture Mining query Mining result Layer4 User Interface User GUI API Layer3 OLAM OLAP Engine Engine OLAP/OLAM Data Cube API Layer2 MDDB MDDB Meta Data Filtering&Integration Database API Filtering Layer1 Data cleaning Data Databases Data Data integration Warehouse Repository April 17, 2012 Data Mining: Concepts and Techniques 27
  • 28. Major Issues in Data Mining (1)  Mining methodology and user interaction  Mining different kinds of knowledge in databases  Interactive mining of knowledge at multiple levels of abstraction  Incorporation of background knowledge  Data mining query languages and ad-hoc data mining  Expression and visualization of data mining results  Handling noise and incomplete data  Pattern evaluation: the interestingness problem  Performance and scalability  Efficiency and scalability of data mining algorithms  Parallel, distributed and incremental mining methods April 17, 2012 Data Mining: Concepts and Techniques 28
  • 29. Major Issues in Data Mining (2)  Issues relating to the diversity of data types  Handling relational and complex types of data  Mining information from heterogeneous databases and global information systems (WWW)  Issues related to applications and social impacts  Application of discovered knowledge  Domain-specific data mining tools  Intelligent query answering  Process control and decision making  Integration of the discovered knowledge with existing knowledge: A knowledge fusion problem  Protection of data security, integrity, and privacy April 17, 2012 Data Mining: Concepts and Techniques 29
  • 30. Summary  Data mining: discovering interesting patterns from large amounts of data  A natural evolution of database technology, in great demand, with wide applications  A KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation  Mining can be performed in a variety of information repositories  Data mining functionalities: characterization, discrimination, association, classification, clustering, outlier and trend analysis, etc.  Classification of data mining systems  Major issues in data mining April 17, 2012 Data Mining: Concepts and Techniques 30
  • 31. Where to Find References?  Data mining and KDD (SIGKDD member CDROM):  Conference proceedings: KDD, and others, such as PKDD, PAKDD, etc.  Journal: Data Mining and Knowledge Discovery  Database field (SIGMOD member CD ROM):  Conference proceedings: ACM-SIGMOD, ACM-PODS, VLDB, ICDE, EDBT, DASFAA  Journals: ACM-TODS, J. ACM, IEEE-TKDE, JIIS, etc.  AI and Machine Learning:  Conference proceedings: Machine learning, AAAI, IJCAI, etc.  Journals: Machine Learning, Artificial Intelligence, etc.  Statistics:  Conference proceedings: Joint Stat. Meeting, etc.  Journals: Annals of statistics, etc.  Visualization:  Conference proceedings: CHI, etc.  Journals: IEEE Trans. visualization and computer graphics, etc. April 17, 2012 Data Mining: Concepts and Techniques 31
  • 32. References  U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, 1996.  J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.  T. Imielinski and H. Mannila. A database perspective on knowledge discovery. Communications of ACM, 39:58-64, 1996.  G. Piatetsky-Shapiro, U. Fayyad, and P. Smith. From data mining to knowledge discovery: An overview. In U.M. Fayyad, et al. (eds.), Advances in Knowledge Discovery and Data Mining, 1-35. AAAI/MIT Press, 1996.  G. Piatetsky-Shapiro and W. J. Frawley. Knowledge Discovery in Databases. AAAI/ MIT Press, 1991. April 17, 2012 Data Mining: Concepts and Techniques 32
  • 33. SB Institute Centre for Knowledge Transfer Thank you !!! April 17, 2012 Data Mining: Concepts and Techniques 33