SlideShare une entreprise Scribd logo
1  sur  19
1.Introduction
 Knowledge discovery describes the process of automatically searching large
volumes of data for patterns that can be considered knowledge about the data.
 It can be categorized according to
1) what kind of data is searched
2) in what form is the result of the search represented.
 Knowledge discovery developed out of the Data mining domain, and is closely
related to it both in terms of methodology and terminology.
 Knowledge representation is a formalism for representing at least the data,
information and knowledge things in an application.
 Knowledge can be represented either as programs in an imperative language or
can be also represented as rules in a declarative language.
2.Knowledge Discovery
 It is also known as Knowledge Discovery in Databases (KDD).
Data
Knowledge
Discovery
Process
useful
information
Requires
much elapsed time.
Five steps of KDD process
3. Data Mining
Data mining involves many different algorithms to accomplish different
tasks
 Data mining algorithms can be characterized as consisting of three parts:
• The purpose of algorithm is to fit a model to the data.
Model
• Some criteria must be used to fit one model over another.
Preference
• All algorithms require some technique to search the data.
Search
4. Classification of Data
Mining
5. Working of Data Mining
 Data mining provides link between separate transaction and analytical systems.
 Data mining software analyzes relationships and patterns in stored transaction data
based on user queries.
 Generally four types of relationships are sought: classes, clusters, associations,
sequential patters.
Extract, transform,
and load
transaction data
Present the
data in a useful
format
Analyze the data
by application
software
Store and
manage the
data
Provide data
access to
business analysts
& IT professionals
Data mining
5. Clustering
WHAT IS A CLUSTER….?
 A cluster is collection of objects
which are “similar” between them
and are “dissimilar” to the objects
belonging to other clusters.
WHAT IS CLUSTERING….?
 The process of organizing objects
into groups whose members are
similar in some way.
 Distance-based clustering &
Conceptual clustering are some of
the types of clustering…
Possible applications of
Clustering
Marketing
Biology Libraries
WorldWideWeb
Problems of clustering
Problems
Cant address
all
requirements
adequately
Large data
items can
cause time
complexity
The result
can be
interpreted in
different
ways
If obvious
distance
measure does not
exist defining it
is not easy
Clustering
algorithms
Exclusive Overlapping Hierarchical Probabilistic
Classification of Clustering
Algorithms
K-means Clustering
Original data K-means clustering
Clustering on “mouse” data set
 K-means is as iterative
clustering algorithm in
which items are moved
among sets of clusters
until the desired set is
reached.
This definition
assumes that each ‘tuple’
has only one numeric
value as apposed to a
‘tuple’ with many
attribute values.
K-means algorithm
Input:
• D = {t1,t2,……..tn} //set of elements
• k //Number of desired clusters
Output:
• K //Set of clusters
Assign initial values for means m1,m2………..mk;
Repeat
Assign each item ti to the cluster which has the closest mean;
Calculate the new mean for each cluster;
Until
---Example---
k = 2
{2,4,10,12,3,20,
30,11,25}
I
N
P
U
T
Output
m1 m2 K1 K2
2 4 {2,3} {4,10,12,20,30
,11,25}
2.5 16 {2,3,4} {10,12,20,30,1
1,25}
3 18 {2,3,4,10} {12,20,30,11,2
5}
4.75 19.6 {2,3,4,10,11,12} {20,30,25}
7 25 {2,3,4,10,11,12} {20,30,25}
Pictorial Representation
So we conclude with...
ThankYou

Contenu connexe

Tendances

Emerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networksEmerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networksNational Institute of Informatics
 
Simple and Flexible DHTs
Simple and Flexible DHTsSimple and Flexible DHTs
Simple and Flexible DHTsLuis Galárraga
 
Basic terminologies
Basic terminologiesBasic terminologies
Basic terminologiesRajendran
 
ESWC 2011 - Designing an Ontology for the Data Documentation Initiative
ESWC 2011 -  Designing an Ontology for the Data Documentation InitiativeESWC 2011 -  Designing an Ontology for the Data Documentation Initiative
ESWC 2011 - Designing an Ontology for the Data Documentation InitiativeDr.-Ing. Thomas Hartmann
 
Elementary data organisation
Elementary data organisationElementary data organisation
Elementary data organisationMuzamil Hussain
 
Mining named entities -IIITH
Mining named entities -IIITHMining named entities -IIITH
Mining named entities -IIITHgaurav264
 
Document Classification Using Hierarchies Clusters Technique
Document Classification Using Hierarchies Clusters TechniqueDocument Classification Using Hierarchies Clusters Technique
Document Classification Using Hierarchies Clusters Techniqueupendra singh
 
Multidimensioal database
Multidimensioal  databaseMultidimensioal  database
Multidimensioal databaseTPO TPO
 
Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013DataTactics
 
EDI Training Module 12: An Introduction to Metadata and Data Repositories
EDI Training Module 12:  An Introduction to Metadata and Data RepositoriesEDI Training Module 12:  An Introduction to Metadata and Data Repositories
EDI Training Module 12: An Introduction to Metadata and Data RepositoriesEnvironmental Data Initiative
 
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and PrivacyA Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and Privacyijsrd.com
 
EDI Training Module 4: Organizing Data Into Publishable Units
EDI Training Module 4: Organizing Data Into Publishable UnitsEDI Training Module 4: Organizing Data Into Publishable Units
EDI Training Module 4: Organizing Data Into Publishable UnitsEnvironmental Data Initiative
 
MS Sql Server: Introduction To Datamining Suing Sql Server
MS Sql Server: Introduction To Datamining Suing Sql ServerMS Sql Server: Introduction To Datamining Suing Sql Server
MS Sql Server: Introduction To Datamining Suing Sql ServerDataminingTools Inc
 
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASECONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASEIJwest
 

Tendances (18)

Emerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networksEmerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networks
 
Simple and Flexible DHTs
Simple and Flexible DHTsSimple and Flexible DHTs
Simple and Flexible DHTs
 
data mining
data miningdata mining
data mining
 
Basic terminologies
Basic terminologiesBasic terminologies
Basic terminologies
 
ESWC 2011 - Designing an Ontology for the Data Documentation Initiative
ESWC 2011 -  Designing an Ontology for the Data Documentation InitiativeESWC 2011 -  Designing an Ontology for the Data Documentation Initiative
ESWC 2011 - Designing an Ontology for the Data Documentation Initiative
 
Ghhh
GhhhGhhh
Ghhh
 
Elementary data organisation
Elementary data organisationElementary data organisation
Elementary data organisation
 
Mining named entities -IIITH
Mining named entities -IIITHMining named entities -IIITH
Mining named entities -IIITH
 
Document Classification Using Hierarchies Clusters Technique
Document Classification Using Hierarchies Clusters TechniqueDocument Classification Using Hierarchies Clusters Technique
Document Classification Using Hierarchies Clusters Technique
 
Multidimensioal database
Multidimensioal  databaseMultidimensioal  database
Multidimensioal database
 
Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013Big Data Taxonomy 8/26/2013
Big Data Taxonomy 8/26/2013
 
EDI Training Module 12: An Introduction to Metadata and Data Repositories
EDI Training Module 12:  An Introduction to Metadata and Data RepositoriesEDI Training Module 12:  An Introduction to Metadata and Data Repositories
EDI Training Module 12: An Introduction to Metadata and Data Repositories
 
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and PrivacyA Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
 
EDI Training Module 4: Organizing Data Into Publishable Units
EDI Training Module 4: Organizing Data Into Publishable UnitsEDI Training Module 4: Organizing Data Into Publishable Units
EDI Training Module 4: Organizing Data Into Publishable Units
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
DM
DMDM
DM
 
MS Sql Server: Introduction To Datamining Suing Sql Server
MS Sql Server: Introduction To Datamining Suing Sql ServerMS Sql Server: Introduction To Datamining Suing Sql Server
MS Sql Server: Introduction To Datamining Suing Sql Server
 
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASECONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
 

Similaire à Knowledge Discovery & Representation

Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and workAmr Abd El Latief
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection methodIJSRD
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data miningeSAT Journals
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data miningeSAT Publishing House
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Reviewijdpsjournal
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueMehmet Beyaz
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)Kartik Kalpande Patil
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESIJCSES Journal
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data miningINFOGAIN PUBLICATION
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.docbutest
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousingSunny Gandhi
 
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)Universitas Pembangunan Panca Budi
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 

Similaire à Knowledge Discovery & Representation (20)

Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
 
Privacy preservation techniques in data mining
Privacy preservation techniques in data miningPrivacy preservation techniques in data mining
Privacy preservation techniques in data mining
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Review
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining Technique
 
CLUSTER ANALYSIS.pptx
CLUSTER ANALYSIS.pptxCLUSTER ANALYSIS.pptx
CLUSTER ANALYSIS.pptx
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Data mining
Data miningData mining
Data mining
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining
 
DATA MINING.doc
DATA MINING.docDATA MINING.doc
DATA MINING.doc
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
 
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 

Knowledge Discovery & Representation

  • 1.
  • 2.
  • 3. 1.Introduction  Knowledge discovery describes the process of automatically searching large volumes of data for patterns that can be considered knowledge about the data.  It can be categorized according to 1) what kind of data is searched 2) in what form is the result of the search represented.  Knowledge discovery developed out of the Data mining domain, and is closely related to it both in terms of methodology and terminology.  Knowledge representation is a formalism for representing at least the data, information and knowledge things in an application.  Knowledge can be represented either as programs in an imperative language or can be also represented as rules in a declarative language.
  • 4. 2.Knowledge Discovery  It is also known as Knowledge Discovery in Databases (KDD). Data Knowledge Discovery Process useful information Requires much elapsed time.
  • 5. Five steps of KDD process
  • 6. 3. Data Mining Data mining involves many different algorithms to accomplish different tasks  Data mining algorithms can be characterized as consisting of three parts: • The purpose of algorithm is to fit a model to the data. Model • Some criteria must be used to fit one model over another. Preference • All algorithms require some technique to search the data. Search
  • 7. 4. Classification of Data Mining
  • 8. 5. Working of Data Mining  Data mining provides link between separate transaction and analytical systems.  Data mining software analyzes relationships and patterns in stored transaction data based on user queries.  Generally four types of relationships are sought: classes, clusters, associations, sequential patters. Extract, transform, and load transaction data Present the data in a useful format Analyze the data by application software Store and manage the data Provide data access to business analysts & IT professionals Data mining
  • 9. 5. Clustering WHAT IS A CLUSTER….?  A cluster is collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters. WHAT IS CLUSTERING….?  The process of organizing objects into groups whose members are similar in some way.  Distance-based clustering & Conceptual clustering are some of the types of clustering…
  • 11. Problems of clustering Problems Cant address all requirements adequately Large data items can cause time complexity The result can be interpreted in different ways If obvious distance measure does not exist defining it is not easy
  • 12. Clustering algorithms Exclusive Overlapping Hierarchical Probabilistic Classification of Clustering Algorithms
  • 13. K-means Clustering Original data K-means clustering Clustering on “mouse” data set  K-means is as iterative clustering algorithm in which items are moved among sets of clusters until the desired set is reached. This definition assumes that each ‘tuple’ has only one numeric value as apposed to a ‘tuple’ with many attribute values.
  • 14. K-means algorithm Input: • D = {t1,t2,……..tn} //set of elements • k //Number of desired clusters Output: • K //Set of clusters Assign initial values for means m1,m2………..mk; Repeat Assign each item ti to the cluster which has the closest mean; Calculate the new mean for each cluster; Until
  • 15. ---Example--- k = 2 {2,4,10,12,3,20, 30,11,25} I N P U T Output m1 m2 K1 K2 2 4 {2,3} {4,10,12,20,30 ,11,25} 2.5 16 {2,3,4} {10,12,20,30,1 1,25} 3 18 {2,3,4,10} {12,20,30,11,2 5} 4.75 19.6 {2,3,4,10,11,12} {20,30,25} 7 25 {2,3,4,10,11,12} {20,30,25}
  • 17. So we conclude with...
  • 18.