SlideShare a Scribd company logo
1 of 20
Download to read offline
CS501: DATABASE AND DATA
MINING
Data Mining Introduction1
WHY DATA MINING?
 The Explosive Growth of Data: from terabytes to petabytes
 Data collection and data availability Data collection and data availability
 Automated data collection tools, database systems, Web,
computerized society
 Major sources of abundant data
 Business: Web, e-commerce, transactions, stocks, …
 Science: Remote sensing, bioinformatics, scientific simulation,
…
 Society and everyone: news, digital cameras, YouTube Society and everyone: news, digital cameras, YouTube
 We are drowning in data, but starving for knowledge!
 “Necessity is the mother of invention”—Data mining—Automated
analysis of massive data sets 2
EVOLUTION OF SCIENCES
 Before 1600, empirical science
 1600-1950s, theoretical science
E h di i li h h i l t Th ti l d l ft ti t Each discipline has grown a theoretical component. Theoretical models often motivate
experiments and generalize our understanding.
 1950s-1990s, computational science
 Over the last 50 years, most disciplines have grown a third, computational branch (e.g.Over the last 50 years, most disciplines have grown a third, computational branch (e.g.
empirical, theoretical, and computational ecology, or physics, or linguistics.)
 Computational Science traditionally meant simulation. It grew out of our inability to
find closed-form solutions for complex mathematical models.
 1990-now, data science
 The flood of data from new scientific instruments and simulations
 The ability to economically store and manage petabytes of data online
 The Internet and computing Grid that makes all these archives universally accessible The Internet and computing Grid that makes all these archives universally accessible
 Scientific info. management, acquisition, organization, query, and visualization tasks
scale almost linearly with data volumes. Data mining is a major new challenge!
 Jim Gray and Alex Szalay, The World Wide Telescope: An Archetype for Online Science,y y, p yp f ,
Comm. ACM, 45(11): 50-54, Nov. 2002 3
EVOLUTION OF DATABASE TECHNOLOGY
 1960s:
 Data collection, database creation, IMS and network DBMS
 1970s:
 Relational data model, relational DBMS implementation
 1980s: 1980s:
 RDBMS, advanced data models (extended-relational, OO, deductive, etc.)
 Application-oriented DBMS (spatial, scientific, engineering, etc.)
 1990 1990s:
 Data mining, data warehousing, multimedia databases, and Web
databases
2000 2000s
 Stream data management and mining
 Data mining and its applications
 Web technology (XML, data integration) and global information systems 4
WHAT IS DATA MINING?
 Data mining (knowledge discovery from data)
E i f i i i i l i li i i l Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge from
huge amount of datauge a ou t o data
 Data mining: a misnomer?
 Alternative names
 Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data
dredging, information harvesting, business intelligence, etc.g g, g, g ,
 Watch out: Is everything “data mining”?
 Simple search and query processing
 (Deductive) expert systems 5
PROCESS OF KNOWLEDGE DISCOVERY
6
STEPS FOR KNOWLEDGE DISCOVERY PROCESSSTEPS FOR KNOWLEDGE DISCOVERY PROCESS
 Data cleaningg
 to remove noise or irrelevant data
 Data integration
 where multiple data sources are combined where multiple data sources are combined
 Data selection
 where data relevant to the task analysis is retrieved
 Data transformation
 where data are transformed or consolidated into
forms appropriate for mining by performing
summary or aggregation operations
 Data mining
 an essential process where intelligent methods arean essential process where intelligent methods are
applied in order to extract data patterns 7
S K D PSTEPS FOR KNOWLEDGE DISCOVERY PROCESS
 Pattern evaluation
 To identify the truly interesting patterns
representing knowledge based on some
interestingness measureinterestingness measure
 Knowledge representation
 Where visualization and knowledge representation
techniques are used to present the mined knowledge
to the user
Data mining is the process of discovering interesting knowledge from
large amount of data stored either in the data bases, data
warehouses, or other information repositories
8
, p
ARCHITECTURE OF DATA MINING SYSTEM
 Based on this view, the architecture of data,
mining system has the following major
components-
D t b d t h th i f ti Database, data warehouse or other information
repository
 Database or data warehouse server
 Knowledge base
 Data mining engine
 Pattern evaluation modulePattern evaluation module
 Graphical user interface
9
ARCHITECTURE OF A DATA MINING SYSTEM
10
DATA MINING: CONFLUENCE OF MULTIPLE DISCIPLINES
Database
S i iTechnology Statistics
Data Mining
Machine
Learning
Visualization
Learning
PatternPattern
Recognition
Algorithm
Other
Disciplines
11
WHY NOT TRADITIONAL DATA ANALYSIS?
 Tremendous amount of data
 Algorithms must be highly scalable to handle such as tera-bytesAlgorithms must be highly scalable to handle such as tera bytes
of data
 High-dimensionality of data
 Micro-array may have tens of thousands of dimensions
 High complexity of data
 Data streams and sensor dataData streams and sensor data
 Time-series data, temporal data, sequence data
 Structure data, graphs, social networks and multi-linked data
 Heterogeneous databases and legacy databases
 Spatial, spatiotemporal, multimedia, text and Web data
 Software programs scientific simulations Software programs, scientific simulations
 New and sophisticated applications
12
DATA MINING- ON WHAT KIND OF DATA
 Different data store on which data mining can beg
done
 It is applicable to any kind of information
itrepository
 However, the challenges and techniques of
mining may differ for each of the repositorymining may differ for each of the repository
systems
13
DATA MINING: ON WHAT KINDS OF DATA?
 Database-oriented data sets and applications
 Relational database data warehouse transactional database Relational database, data warehouse, transactional database
 Advanced data sets and advanced applications
 Data streams and sensor data
 Time-series data, temporal data, sequence data (incl. bio-sequences)
 Structure data, graphs, social networks and multi-linked data
 Object-relational databases
 Heterogeneous databases and legacy databases
 Spatial data and spatiotemporal dataSpatial data and spatiotemporal data
 Multimedia database
 Text databases
 The World-Wide Web 14
RELATIONAL DATABASES
15
DATA WAREHOUSE
16
MULTIDIMENSIONAL DATA CUBE
Showing
summarized
data
17
TRANSACTIONAL DATABASE
 Consists of files where
each record represent
a transaction
T i ll i l d Typically includes a
transaction identity
number and a list of
items making up the
transaction
18
DATA MINING FUNCTIONALITIES
 Multidimensional concept description: Characterization and
discriminationdiscrimination
 Generalize, summarize, and contrast data characteristics, e.g.,
dry vs. wet regions
 Frequent patterns, association, correlation vs. causality
 Diaper  Beer [0.5%, 75%] (Correlation or causality?)
Cl ifi i d di i Classification and prediction
 Construct models (functions) that describe and distinguish
classes or concepts for future predictionp p
 E.g., classify countries based on (climate), or classify cars
based on (gas mileage)
 Predict some unknown or missing numerical values 19
DATA MINING FUNCTIONALITIES (2)
 Cluster analysis
 Class label is unknown: Group data to form new classes, e.g., clusterp , g ,
houses to find distribution patterns
 Maximizing intra-class similarity & minimizing interclass similarity
 Outlier analysisy
 Outlier: Data object that does not comply with the general behavior
of the data
 Noise or exception? Useful in fraud detection, rare events analysiso se o e cep o ? Use a e ec o , a e eve s a a ys s
 Trend and evolution analysis
 Trend and deviation: e.g., regression analysis
 Sequential pattern mining: e g digital camera  large SD memory Sequential pattern mining: e.g., digital camera  large SD memory
 Periodicity analysis
 Similarity-based analysis
O h di d i i l l Other pattern-directed or statistical analyses 20

More Related Content

What's hot

INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
Attila Barta
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
butest
 
Data Mining Concepts and Techniques
Data Mining Concepts and TechniquesData Mining Concepts and Techniques
Data Mining Concepts and Techniques
Pratik Tambekar
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
Slideshare
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Geoffrey Fox
 

What's hot (20)

INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Token
TokenToken
Token
 
Presentation on BigData by Swapnaja
Presentation on BigData by Swapnaja Presentation on BigData by Swapnaja
Presentation on BigData by Swapnaja
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Data Mining Overview
Data Mining OverviewData Mining Overview
Data Mining Overview
 
Research Topics on Data Mining
Research Topics on Data MiningResearch Topics on Data Mining
Research Topics on Data Mining
 
2016 05-20-clariah-wp4
2016 05-20-clariah-wp42016 05-20-clariah-wp4
2016 05-20-clariah-wp4
 
Datamining
DataminingDatamining
Datamining
 
Data mining 1
Data mining 1Data mining 1
Data mining 1
 
Data Mining Concepts and Techniques
Data Mining Concepts and TechniquesData Mining Concepts and Techniques
Data Mining Concepts and Techniques
 
Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019
 
Data mining
Data miningData mining
Data mining
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
01 intro
01 intro01 intro
01 intro
 
introduction to data warehousing and mining
 introduction to data warehousing and mining introduction to data warehousing and mining
introduction to data warehousing and mining
 
data warehousing and data mining
data warehousing and data mining data warehousing and data mining
data warehousing and data mining
 
Big data mining
Big data miningBig data mining
Big data mining
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 

Similar to Cs501 dm intro

Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
PadmajaLaksh
 

Similar to Cs501 dm intro (20)

Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
data mining
data miningdata mining
data mining
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
 
Data Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesData Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notes
 
Introduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptIntroduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .ppt
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
unit 1 DATA MINING.ppt
unit 1 DATA MINING.pptunit 1 DATA MINING.ppt
unit 1 DATA MINING.ppt
 
Moving forward data centric sciences weaving AI, Big Data & HPC
Moving forward data centric sciences  weaving AI, Big Data & HPCMoving forward data centric sciences  weaving AI, Big Data & HPC
Moving forward data centric sciences weaving AI, Big Data & HPC
 
Data Mining introduction and basic concepts
Data Mining introduction and basic conceptsData Mining introduction and basic concepts
Data Mining introduction and basic concepts
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Dwdm
DwdmDwdm
Dwdm
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Data Mining
Data MiningData Mining
Data Mining
 
Dm unit i r16
Dm unit i   r16Dm unit i   r16
Dm unit i r16
 
Dm lecture1
Dm lecture1Dm lecture1
Dm lecture1
 

More from Kamal Singh Lodhi (16)

Introduction to Data Structure
Introduction to Data Structure Introduction to Data Structure
Introduction to Data Structure
 
Stack Algorithm
Stack AlgorithmStack Algorithm
Stack Algorithm
 
Data Structure (MC501)
Data Structure (MC501)Data Structure (MC501)
Data Structure (MC501)
 
Cs501 trc drc
Cs501 trc drcCs501 trc drc
Cs501 trc drc
 
Cs501 transaction
Cs501 transactionCs501 transaction
Cs501 transaction
 
Cs501 rel algebra
Cs501 rel algebraCs501 rel algebra
Cs501 rel algebra
 
Cs501 mining frequentpatterns
Cs501 mining frequentpatternsCs501 mining frequentpatterns
Cs501 mining frequentpatterns
 
Cs501 intro
Cs501 introCs501 intro
Cs501 intro
 
Cs501 fd nf
Cs501 fd nfCs501 fd nf
Cs501 fd nf
 
Cs501 data preprocessingdw
Cs501 data preprocessingdwCs501 data preprocessingdw
Cs501 data preprocessingdw
 
Cs501 concurrency
Cs501 concurrencyCs501 concurrency
Cs501 concurrency
 
Cs501 cluster analysis
Cs501 cluster analysisCs501 cluster analysis
Cs501 cluster analysis
 
Cs501 classification prediction
Cs501 classification predictionCs501 classification prediction
Cs501 classification prediction
 
Attribute Classification
Attribute ClassificationAttribute Classification
Attribute Classification
 
Real Time ImageVideo Processing with Applications in Face Recognition
Real Time ImageVideo Processing with  Applications in Face Recognition   Real Time ImageVideo Processing with  Applications in Face Recognition
Real Time ImageVideo Processing with Applications in Face Recognition
 
Flow diagram
Flow diagramFlow diagram
Flow diagram
 

Recently uploaded

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Recently uploaded (20)

Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 

Cs501 dm intro

  • 1. CS501: DATABASE AND DATA MINING Data Mining Introduction1
  • 2. WHY DATA MINING?  The Explosive Growth of Data: from terabytes to petabytes  Data collection and data availability Data collection and data availability  Automated data collection tools, database systems, Web, computerized society  Major sources of abundant data  Business: Web, e-commerce, transactions, stocks, …  Science: Remote sensing, bioinformatics, scientific simulation, …  Society and everyone: news, digital cameras, YouTube Society and everyone: news, digital cameras, YouTube  We are drowning in data, but starving for knowledge!  “Necessity is the mother of invention”—Data mining—Automated analysis of massive data sets 2
  • 3. EVOLUTION OF SCIENCES  Before 1600, empirical science  1600-1950s, theoretical science E h di i li h h i l t Th ti l d l ft ti t Each discipline has grown a theoretical component. Theoretical models often motivate experiments and generalize our understanding.  1950s-1990s, computational science  Over the last 50 years, most disciplines have grown a third, computational branch (e.g.Over the last 50 years, most disciplines have grown a third, computational branch (e.g. empirical, theoretical, and computational ecology, or physics, or linguistics.)  Computational Science traditionally meant simulation. It grew out of our inability to find closed-form solutions for complex mathematical models.  1990-now, data science  The flood of data from new scientific instruments and simulations  The ability to economically store and manage petabytes of data online  The Internet and computing Grid that makes all these archives universally accessible The Internet and computing Grid that makes all these archives universally accessible  Scientific info. management, acquisition, organization, query, and visualization tasks scale almost linearly with data volumes. Data mining is a major new challenge!  Jim Gray and Alex Szalay, The World Wide Telescope: An Archetype for Online Science,y y, p yp f , Comm. ACM, 45(11): 50-54, Nov. 2002 3
  • 4. EVOLUTION OF DATABASE TECHNOLOGY  1960s:  Data collection, database creation, IMS and network DBMS  1970s:  Relational data model, relational DBMS implementation  1980s: 1980s:  RDBMS, advanced data models (extended-relational, OO, deductive, etc.)  Application-oriented DBMS (spatial, scientific, engineering, etc.)  1990 1990s:  Data mining, data warehousing, multimedia databases, and Web databases 2000 2000s  Stream data management and mining  Data mining and its applications  Web technology (XML, data integration) and global information systems 4
  • 5. WHAT IS DATA MINING?  Data mining (knowledge discovery from data) E i f i i i i l i li i i l Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of datauge a ou t o data  Data mining: a misnomer?  Alternative names  Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.g g, g, g ,  Watch out: Is everything “data mining”?  Simple search and query processing  (Deductive) expert systems 5
  • 6. PROCESS OF KNOWLEDGE DISCOVERY 6
  • 7. STEPS FOR KNOWLEDGE DISCOVERY PROCESSSTEPS FOR KNOWLEDGE DISCOVERY PROCESS  Data cleaningg  to remove noise or irrelevant data  Data integration  where multiple data sources are combined where multiple data sources are combined  Data selection  where data relevant to the task analysis is retrieved  Data transformation  where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations  Data mining  an essential process where intelligent methods arean essential process where intelligent methods are applied in order to extract data patterns 7
  • 8. S K D PSTEPS FOR KNOWLEDGE DISCOVERY PROCESS  Pattern evaluation  To identify the truly interesting patterns representing knowledge based on some interestingness measureinterestingness measure  Knowledge representation  Where visualization and knowledge representation techniques are used to present the mined knowledge to the user Data mining is the process of discovering interesting knowledge from large amount of data stored either in the data bases, data warehouses, or other information repositories 8 , p
  • 9. ARCHITECTURE OF DATA MINING SYSTEM  Based on this view, the architecture of data, mining system has the following major components- D t b d t h th i f ti Database, data warehouse or other information repository  Database or data warehouse server  Knowledge base  Data mining engine  Pattern evaluation modulePattern evaluation module  Graphical user interface 9
  • 10. ARCHITECTURE OF A DATA MINING SYSTEM 10
  • 11. DATA MINING: CONFLUENCE OF MULTIPLE DISCIPLINES Database S i iTechnology Statistics Data Mining Machine Learning Visualization Learning PatternPattern Recognition Algorithm Other Disciplines 11
  • 12. WHY NOT TRADITIONAL DATA ANALYSIS?  Tremendous amount of data  Algorithms must be highly scalable to handle such as tera-bytesAlgorithms must be highly scalable to handle such as tera bytes of data  High-dimensionality of data  Micro-array may have tens of thousands of dimensions  High complexity of data  Data streams and sensor dataData streams and sensor data  Time-series data, temporal data, sequence data  Structure data, graphs, social networks and multi-linked data  Heterogeneous databases and legacy databases  Spatial, spatiotemporal, multimedia, text and Web data  Software programs scientific simulations Software programs, scientific simulations  New and sophisticated applications 12
  • 13. DATA MINING- ON WHAT KIND OF DATA  Different data store on which data mining can beg done  It is applicable to any kind of information itrepository  However, the challenges and techniques of mining may differ for each of the repositorymining may differ for each of the repository systems 13
  • 14. DATA MINING: ON WHAT KINDS OF DATA?  Database-oriented data sets and applications  Relational database data warehouse transactional database Relational database, data warehouse, transactional database  Advanced data sets and advanced applications  Data streams and sensor data  Time-series data, temporal data, sequence data (incl. bio-sequences)  Structure data, graphs, social networks and multi-linked data  Object-relational databases  Heterogeneous databases and legacy databases  Spatial data and spatiotemporal dataSpatial data and spatiotemporal data  Multimedia database  Text databases  The World-Wide Web 14
  • 18. TRANSACTIONAL DATABASE  Consists of files where each record represent a transaction T i ll i l d Typically includes a transaction identity number and a list of items making up the transaction 18
  • 19. DATA MINING FUNCTIONALITIES  Multidimensional concept description: Characterization and discriminationdiscrimination  Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions  Frequent patterns, association, correlation vs. causality  Diaper  Beer [0.5%, 75%] (Correlation or causality?) Cl ifi i d di i Classification and prediction  Construct models (functions) that describe and distinguish classes or concepts for future predictionp p  E.g., classify countries based on (climate), or classify cars based on (gas mileage)  Predict some unknown or missing numerical values 19
  • 20. DATA MINING FUNCTIONALITIES (2)  Cluster analysis  Class label is unknown: Group data to form new classes, e.g., clusterp , g , houses to find distribution patterns  Maximizing intra-class similarity & minimizing interclass similarity  Outlier analysisy  Outlier: Data object that does not comply with the general behavior of the data  Noise or exception? Useful in fraud detection, rare events analysiso se o e cep o ? Use a e ec o , a e eve s a a ys s  Trend and evolution analysis  Trend and deviation: e.g., regression analysis  Sequential pattern mining: e g digital camera  large SD memory Sequential pattern mining: e.g., digital camera  large SD memory  Periodicity analysis  Similarity-based analysis O h di d i i l l Other pattern-directed or statistical analyses 20