SlideShare a Scribd company logo
1 of 32
SUBMITTED BY: SHUVRA GHOSH
ROLL NO: 07
COURSE: MLIS
GUIDED BY: PROF. UDAYAN BHATTACHARYA
DEPARTMENT OF LIBRARY AND
INFORMATION SCIENCE
JADAVPUR UNIVERSITY
*
*
Process of discovering valuable information from a
collection of data, or it is the process of converting raw
data into useful information.
Knowledge discovery is an activity that produces
knowledge by discovering it or deriving it from existing
information.
Knowledge Discovery refers to the overall process of
discovering useful knowledge from data, and data mining
refers to a particular step in this process.
*Why do we need knowledge discovery
process?
*
• Database data
• Data Warehouse
• Transactional data
• Other kinds of Data-
Time related data
Sequence data (historical data records, Stock Exchange)
Data streams (Video surveillance, Sensor data)
Spatial data (Maps)
Hypertext and Multimedia data (Text, Video, Audio)
Graph and networked data
Engineering design data (auto CAD)
Web
*
• Interactive
• Iterative
• Procedure to extract knowledge from data
• Knowledge being searched for is –
implicit
previously unknown
potentially useful
*
*
Data Cleaning − in this step, the noise and inconsistent data is
removed. Example Parsing the Data.
Cleaning is performed for detection
Of syntax error.
Parser decides the given string of
Data is acceptable within data
Specification.
*
Data Integration − in this step, multiple data sources are combined
Example: Retail loan application, commercial loan application,
demand deposit application are combined in bank data
warehouse.
.
Data Selection − in this step, data relevant to the analysis task
are retrieved from the database.
*
Data Transformation − in this step, data is transformed or consolidated into
forms appropriate for mining by performing summary or aggregation
operations.
The aggregation operators perform mathematical operations like Average,
Aggregate, Count, Max, Min and Sum, on the numeric property of the
elements in the collection.
*
Data Mining − in this step, intelligent methods are applied in order to
extract data patterns.
intelligent methods are –
• Association
• Classification
Decision tree
• Clustering
• Regression
*
*
*
*
Pattern Evaluation − in this step, data patterns are evaluated.
*
Knowledge Presentation − in this step, knowledge is
represented by various visualize tools.
 Table
 Chart
 Graph
*
Knowledge discovery process has three parts
Academic Research Models
Industrial Models
Hybrid Models
•
 The efforts to establish a KDP model were initiated in
academia, in the mid-1990s.
 when the DM field was being shaped, researchers started
defining multistep procedures to guide users of DM tools in
the complex knowledge discovery world.
 The two process models developed in 1996 and 1998 are the
nine-step model by Fayyad et al. and the eight-step model by
Anand and Buchner.
*
1.Developing and understanding the application domain. This step
includes learning the relevant prior knowledge and the goals of the end user of
the discovered knowledge.
2. Creating a target data set. Here the data miner selects a subset of variables
(attributes) and data points (examples) that will be used to perform discovery
tasks. This step usually includes querying the existing data to select the desired
subset.
3. Data cleaning and pre-processing. This step consists of removing outliers,
dealing with noise and missing values in the data, and accounting for time
sequence information and known changes.
4. Data reduction and projection. This step consists of finding useful
attributes by applying dimension reduction and transformation methods, and
finding invariant representation of the data.
5. Choosing the data mining task. Here the data miner matches the goals
defined in Step 1 with a particular DM method, such as classification,
regression, clustering, etc.
*
Two representative industrial models are the five-step model by
Cabena et al., with support from IBM and the industrial six-step
CRISP-DM model, developed by a large consortium of
European companies.
*
The CRISP-DM (Cross-Industry Standard Process for Data Mining)
was first established in the late 1990s by four companies: Integral
Solutions Ltd. (a provider of commercial data mining solutions),
NCR (a database provider), DaimlerChrysler (an automobile
manufacturer), and OHRA (an insurance company).
*
*
The development of academic and industrial models has led to the
development of hybrid models, i.e., models that combine aspects of both.
One such model is a six-step KDP model developed by Cios et al.
The main differences and extensions include
• providing more general, research-oriented description of the steps,
• introducing a data mining step instead of the modeling step,
• introducing several new explicit feedback mechanisms, (the CRISP-
DM model has only three major feedback sources, while the hybrid
model has more detailed feedback mechanisms) and
• Modification of the last step, since in the hybrid model, the
knowledge discovered for a particular domain may be applied in other
domains.
*
*
1. Understanding of the problem domain. This initial step involves
working closely with domain experts to define the problem and
determine the project goals, identifying key people, and learning about
current solutions to the problem. It also involves learning domain-
specific terminology. A description of the problem, including its
restrictions, is prepared. Finally, project goals are translated into DM
goals, and the initial selection of DM tools to be used later in the process
is performed.
2. Understanding of the data. This step includes collecting sample data
and deciding which data, including format and size, will be needed.
Background knowledge can be used to guide these efforts. Data are
checked for completeness, redundancy, missing values, plausibility of
attribute values, etc. Finally, the step includes verification of the
usefulness of the data with respect to the DM goals.
*
Knowledge Discovery in Databases is the process by which a task is
identified and performed upon a database in order to extract
information about the elements of the database. This process involves
first collecting the data to be analysed, cleaning up the data, and
reducing it to those features of interest to the process. At which time the
tool or tools to be used upon the data are identified. These tools are
then used to mine the data for information. Once the information has
been created, it must be evaluated as to it efficacy to the process. Any
knowledge thereupon gained is then re-incorporated into the process as
well as used for purposes outside the scope of the process.
This is a very complex process, but it is one that lends itself to a fair
degree of automation. As such, it enters into the field of artificial
intelligence, not just for the tools it employs, but for the fact that the
process tries to re-incorporate the knowledge it has created.
*
*Thank you

More Related Content

What's hot

Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data MiningAmritanshu Mehra
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data miningSlideshare
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1malathieswaran29
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalitiesKrish_ver2
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data miningkavitha muneeshwaran
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization janani thirupathi
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesFellowBuddy.com
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
 

What's hot (20)

Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 
OLAP
OLAPOLAP
OLAP
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data Mining
Data MiningData Mining
Data Mining
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
 
Decision tree
Decision treeDecision tree
Decision tree
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data cubes
Data cubesData cubes
Data cubes
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 

Similar to Knowledge discovery process

knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)Kartik Kalpande Patil
 
6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data miningINFOGAIN PUBLICATION
 
crisp.ppt
crisp.pptcrisp.ppt
crisp.pptSK Chew
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introductionBasma Gamal
 
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEDATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEIJDKP
 
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET-	 Fault Detection and Prediction of Failure using Vibration AnalysisIRJET-	 Fault Detection and Prediction of Failure using Vibration Analysis
IRJET- Fault Detection and Prediction of Failure using Vibration AnalysisIRJET Journal
 
From data mining to knowledge discovery in
From data mining to knowledge discovery inFrom data mining to knowledge discovery in
From data mining to knowledge discovery inRaj Kumar Ranabhat
 
Data Mining Implementation process.pptx
Data Mining Implementation process.pptxData Mining Implementation process.pptx
Data Mining Implementation process.pptxLithal Fragrance
 
A review on data mining
A  review on data miningA  review on data mining
A review on data miningEr. Nancy
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective ApproachIRJET Journal
 
Introducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huIntroducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huwekineheshete
 

Similar to Knowledge discovery process (20)

dwdm unit 1.ppt
dwdm unit 1.pptdwdm unit 1.ppt
dwdm unit 1.ppt
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)
 
6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
crisp.ppt
crisp.pptcrisp.ppt
crisp.ppt
 
crisp.ppt
crisp.pptcrisp.ppt
crisp.ppt
 
Data mining
Data miningData mining
Data mining
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
ml-02x01.pdf
ml-02x01.pdfml-02x01.pdf
ml-02x01.pdf
 
KDD assignmnt data.docx
KDD assignmnt data.docxKDD assignmnt data.docx
KDD assignmnt data.docx
 
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEDATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
 
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET-	 Fault Detection and Prediction of Failure using Vibration AnalysisIRJET-	 Fault Detection and Prediction of Failure using Vibration Analysis
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
 
From data mining to knowledge discovery in
From data mining to knowledge discovery inFrom data mining to knowledge discovery in
From data mining to knowledge discovery in
 
Data Mining Implementation process.pptx
Data Mining Implementation process.pptxData Mining Implementation process.pptx
Data Mining Implementation process.pptx
 
A review on data mining
A  review on data miningA  review on data mining
A review on data mining
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective Approach
 
Unit 3.pdf
Unit 3.pdfUnit 3.pdf
Unit 3.pdf
 
Introducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huIntroducition to Data scinece compiled by hu
Introducition to Data scinece compiled by hu
 

More from Shuvra Ghosh

Intelligent Information Agent
Intelligent Information AgentIntelligent Information Agent
Intelligent Information AgentShuvra Ghosh
 
Fundamental Category
 Fundamental Category Fundamental Category
Fundamental CategoryShuvra Ghosh
 
Economics of information
Economics of information Economics of information
Economics of information Shuvra Ghosh
 

More from Shuvra Ghosh (6)

Intelligent Information Agent
Intelligent Information AgentIntelligent Information Agent
Intelligent Information Agent
 
Altmetrics
Altmetrics Altmetrics
Altmetrics
 
Fundamental Category
 Fundamental Category Fundamental Category
Fundamental Category
 
ISO 2709
ISO 2709ISO 2709
ISO 2709
 
Economics of information
Economics of information Economics of information
Economics of information
 
Web of Science
Web of ScienceWeb of Science
Web of Science
 

Recently uploaded

Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,dollysharma2066
 
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)Delhi Call girls
 
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)Delhi Call girls
 
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girlsPooja Nehwal
 
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...PsychicRuben LoveSpells
 
2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)Delhi Call girls
 
The Selfspace Journal Preview by Mindbrush
The Selfspace Journal Preview by MindbrushThe Selfspace Journal Preview by Mindbrush
The Selfspace Journal Preview by MindbrushShivain97
 
LC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdfLC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdfpastor83
 
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Morcall Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Morvikas rana
 
Pokemon Go... Unraveling the Conspiracy Theory
Pokemon Go... Unraveling the Conspiracy TheoryPokemon Go... Unraveling the Conspiracy Theory
Pokemon Go... Unraveling the Conspiracy Theorydrae5
 
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)Delhi Call girls
 
WOMEN EMPOWERMENT women empowerment.pptx
WOMEN EMPOWERMENT women empowerment.pptxWOMEN EMPOWERMENT women empowerment.pptx
WOMEN EMPOWERMENT women empowerment.pptxpadhand000
 

Recently uploaded (15)

Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
 
(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7
(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7
(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7
 
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
 
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
 
(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...
(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...
(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...
 
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
 
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
 
2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Jasola (Delhi)
 
The Selfspace Journal Preview by Mindbrush
The Selfspace Journal Preview by MindbrushThe Selfspace Journal Preview by Mindbrush
The Selfspace Journal Preview by Mindbrush
 
LC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdfLC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdf
 
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Morcall Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
 
Pokemon Go... Unraveling the Conspiracy Theory
Pokemon Go... Unraveling the Conspiracy TheoryPokemon Go... Unraveling the Conspiracy Theory
Pokemon Go... Unraveling the Conspiracy Theory
 
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
 
WOMEN EMPOWERMENT women empowerment.pptx
WOMEN EMPOWERMENT women empowerment.pptxWOMEN EMPOWERMENT women empowerment.pptx
WOMEN EMPOWERMENT women empowerment.pptx
 

Knowledge discovery process

  • 1. SUBMITTED BY: SHUVRA GHOSH ROLL NO: 07 COURSE: MLIS GUIDED BY: PROF. UDAYAN BHATTACHARYA DEPARTMENT OF LIBRARY AND INFORMATION SCIENCE JADAVPUR UNIVERSITY *
  • 2. * Process of discovering valuable information from a collection of data, or it is the process of converting raw data into useful information. Knowledge discovery is an activity that produces knowledge by discovering it or deriving it from existing information. Knowledge Discovery refers to the overall process of discovering useful knowledge from data, and data mining refers to a particular step in this process.
  • 3. *Why do we need knowledge discovery process?
  • 4. *
  • 5. • Database data • Data Warehouse • Transactional data • Other kinds of Data- Time related data Sequence data (historical data records, Stock Exchange) Data streams (Video surveillance, Sensor data) Spatial data (Maps) Hypertext and Multimedia data (Text, Video, Audio) Graph and networked data Engineering design data (auto CAD) Web *
  • 6. • Interactive • Iterative • Procedure to extract knowledge from data • Knowledge being searched for is – implicit previously unknown potentially useful *
  • 7. *
  • 8. Data Cleaning − in this step, the noise and inconsistent data is removed. Example Parsing the Data. Cleaning is performed for detection Of syntax error. Parser decides the given string of Data is acceptable within data Specification. *
  • 9. Data Integration − in this step, multiple data sources are combined Example: Retail loan application, commercial loan application, demand deposit application are combined in bank data warehouse. .
  • 10. Data Selection − in this step, data relevant to the analysis task are retrieved from the database. *
  • 11. Data Transformation − in this step, data is transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations. The aggregation operators perform mathematical operations like Average, Aggregate, Count, Max, Min and Sum, on the numeric property of the elements in the collection. *
  • 12. Data Mining − in this step, intelligent methods are applied in order to extract data patterns. intelligent methods are – • Association • Classification Decision tree • Clustering • Regression *
  • 13. *
  • 14. *
  • 15. *
  • 16. Pattern Evaluation − in this step, data patterns are evaluated. *
  • 17. Knowledge Presentation − in this step, knowledge is represented by various visualize tools.  Table  Chart  Graph *
  • 18. Knowledge discovery process has three parts Academic Research Models Industrial Models Hybrid Models •
  • 19.  The efforts to establish a KDP model were initiated in academia, in the mid-1990s.  when the DM field was being shaped, researchers started defining multistep procedures to guide users of DM tools in the complex knowledge discovery world.  The two process models developed in 1996 and 1998 are the nine-step model by Fayyad et al. and the eight-step model by Anand and Buchner. *
  • 20. 1.Developing and understanding the application domain. This step includes learning the relevant prior knowledge and the goals of the end user of the discovered knowledge. 2. Creating a target data set. Here the data miner selects a subset of variables (attributes) and data points (examples) that will be used to perform discovery tasks. This step usually includes querying the existing data to select the desired subset. 3. Data cleaning and pre-processing. This step consists of removing outliers, dealing with noise and missing values in the data, and accounting for time sequence information and known changes. 4. Data reduction and projection. This step consists of finding useful attributes by applying dimension reduction and transformation methods, and finding invariant representation of the data. 5. Choosing the data mining task. Here the data miner matches the goals defined in Step 1 with a particular DM method, such as classification, regression, clustering, etc. *
  • 21.
  • 22. Two representative industrial models are the five-step model by Cabena et al., with support from IBM and the industrial six-step CRISP-DM model, developed by a large consortium of European companies. *
  • 23. The CRISP-DM (Cross-Industry Standard Process for Data Mining) was first established in the late 1990s by four companies: Integral Solutions Ltd. (a provider of commercial data mining solutions), NCR (a database provider), DaimlerChrysler (an automobile manufacturer), and OHRA (an insurance company). *
  • 24. *
  • 25.
  • 26. The development of academic and industrial models has led to the development of hybrid models, i.e., models that combine aspects of both. One such model is a six-step KDP model developed by Cios et al. The main differences and extensions include • providing more general, research-oriented description of the steps, • introducing a data mining step instead of the modeling step, • introducing several new explicit feedback mechanisms, (the CRISP- DM model has only three major feedback sources, while the hybrid model has more detailed feedback mechanisms) and • Modification of the last step, since in the hybrid model, the knowledge discovered for a particular domain may be applied in other domains. *
  • 27. *
  • 28. 1. Understanding of the problem domain. This initial step involves working closely with domain experts to define the problem and determine the project goals, identifying key people, and learning about current solutions to the problem. It also involves learning domain- specific terminology. A description of the problem, including its restrictions, is prepared. Finally, project goals are translated into DM goals, and the initial selection of DM tools to be used later in the process is performed. 2. Understanding of the data. This step includes collecting sample data and deciding which data, including format and size, will be needed. Background knowledge can be used to guide these efforts. Data are checked for completeness, redundancy, missing values, plausibility of attribute values, etc. Finally, the step includes verification of the usefulness of the data with respect to the DM goals. *
  • 29.
  • 30.
  • 31. Knowledge Discovery in Databases is the process by which a task is identified and performed upon a database in order to extract information about the elements of the database. This process involves first collecting the data to be analysed, cleaning up the data, and reducing it to those features of interest to the process. At which time the tool or tools to be used upon the data are identified. These tools are then used to mine the data for information. Once the information has been created, it must be evaluated as to it efficacy to the process. Any knowledge thereupon gained is then re-incorporated into the process as well as used for purposes outside the scope of the process. This is a very complex process, but it is one that lends itself to a fair degree of automation. As such, it enters into the field of artificial intelligence, not just for the tools it employs, but for the fact that the process tries to re-incorporate the knowledge it has created. *