SlideShare une entreprise Scribd logo
1  sur  29
Introduction to Datamining
using Practical View
Created : Ngô Tùng Sơn
Part 1
Schedule:
1. Example of Datamining
2. What and Where is Datamining in the System
3. Datamining Techniques
 Data preprocessing
 Data Analysis
 Data Visualization
How data look like?
X Y
3 3
3 1
2 2
4 6
2 3
6 7
7 5
5 6
Can we get some thing from this?
The row represents
an object and its
columns represent
its attributes
Ex: can we identify the group of these objects? YES
1. Example of Datamining
Now, forget the table, consider a row as a point then we have
0
2
4
6
8
0 2 4 6 8
X
Y
B
A
C
From each data point, we find its neighbors by scanning with a radius r .
For Example : A will have 2 Neighbors B and C , denoted: A{B,C}
r
D
A and D have same neighbors so they are considered as neighbors
Same for B {A,B,C,D} ,C{A,B,C,D}, D{B,C}
The points have neighborhood will be in the same group.
1. Example of Datamining
Finally we have 2 groups after considering all points
0
2
4
6
8
0 2 4 6 8
X
Y
What do we see here?
Data has not been classified into groups but we now have the groups
This is just an example of technique called CLUSTERING in DATAMINING
1. Example of Datamining
2. What and Where is Datamining in the System
So. What exactly is Datamining?
Datamining is the set of tools and techniques to retrieve
hidden Knowledge/Rules from data
The name of datamining could make us to misunderstand
Data was there, we do not need to ‘mining’ it
For ore mining you need hammers and shovels 
However, for datamining you need mathematic, statistic and
probability, machine learning, computer programming,
database techniques,...
2. What and Where is Datamining in the System
Where is Datamining in the system?
Employee/Staff
Day by day, The staff using the software (Web/
Desktop/Mobile application) to generate data by recording
all of his/her business activities (customers, products,
order detail, contracts ,…) Database
Data is added to Database
Online transaction processing (OLTP)
Database
Database
….
Data from several data sources (OLTP) will be collected to a common repository
Data
warehouse
Integration
Service
Datamining service will access to the Data warehouse to process
Data Mining
3. Datamining Techniques
What are the techniques in Datamining?
There are so many techniques can be applied in datamining
Basically we can classify them into 3 groups / phases
Data-Preprocessing
Data Analysis
Data Presentation
3. Datamining Techniques
Data-Preprocessing
3. Datamining Techniques
We can understand that:
The quality of collected data would be not good.
It is necessary to clean / format / transform .... Before analyzing
This is very important process. It is very hard to find an
abstract way to describe.
Data-Preprocessing
Here we will see few examples of data pre-processing
techniques:
• Similarity Measure
• Down Sampling
• Dimension Reduction
• Vectorization
3. Datamining Techniques
How can we know which object are similar?
Data-Preprocessing Similarity Measure
A(x1,y1)
B(x2,y2)
C(x1,y1)
D2D1
Measure the distance between AB and AC
We see that D1 < D2 -> A is more similar with B than C
Every point can be represented as vector. Measure the angle between
pair of vectors: A and B, then A and C
We see that 𝜶 < 𝜷 -> A is more similar with B than C
𝜶
𝜷
3. Datamining Techniques
What if, you have so many data, performing data analysis on all
of them may be not necessary and reducing performance ?
Data-Preprocessing Down Sampling
Just pick some of them to evaluate
Example: using a cell-size of 𝑔. Keep only object / cell
𝑔
𝑔
Origin Data Down Sampling
3. Datamining Techniques
All example data have been presented to you are in 2
dimensions, 2 attributes (X,Y) . What if it was ~10.000 attributes
for each object
Data-Preprocessing Dimension Reduction
This could reduce the performance (and or accuracy) of data-
analysis algorithms . Somehow we need to reduce number of
dimensions
Principal component Analysis & Singular value Decomposition
are 2 of most effective methods to do this
3. Datamining Techniques
Data-Preprocessing Dimension Reduction - PCA
PCA
X
Y
𝑃1
𝑃2
Origin Data Data projected to Principal Components
We Only keep 𝑘 Principal Components that have highest eigenvalues. On above
example. We can let 𝑘 = 1 then keep 𝑃1 instead of both 𝑃1 , 𝑃2
By this way the number of dimensions has been reduced
3. Datamining Techniques
Data-Preprocessing Vectorization
Most of Data Analysis algorithms consider the input as set of
vectors, so we need to transform the collected data into set of
vectors.
Ex: Giving a document: “Mr A has not passed the exam this
year. He will do it again next year”
Some of important words will be extracted like “Mr A” , “not” ,
“pass” ,”exam” , “again” , “next” , “year”
Measure the frequency of each word, we get the vector that
represent the document
Mr A not pass exam again next year
1 1 1 1 1 1 2
3. Datamining Techniques
Data Analysis
3. Datamining Techniques
There are so many techniques in this phase:
• Clustering
• Classification
• Regression
• Rule Bases
• ….
This is the most important phase, where we find all of
hidden knowledge/ rules in the data
Data Analysis
3. Datamining Techniques
The process of clustering is to find ways to group objects
into groups (clusters)
Data Analysis Clustering
The objects in the same cluster are similar and otherwise
they are not similar.
There are 2 types of clustering : Partional & Hierarchical
In this presentation: we see an example of the most famous
clustering method : K-Mean
3. Datamining Techniques
Data Analysis Clustering – K mean Algorithm
1. Randomly select K center (centroid) for K clusters (cluster).
2. Calculate the distance between objects (objects) to the K center
3. Group objects to the nearest group
4. Defining the new focus for the group
5. Repeat step 2 until no change of subject groups
3. Datamining Techniques
Data Analysis Clustering – K mean Algorithm
Consider the below data
Plot them we have:
3. Datamining Techniques
Data Analysis Clustering – K mean Algorithm
Select K=2 centroids Compute the new position of
centroids
Finally centroids stop changing
The object belongs to the group of
its closest centroid
The key point of algorithm is to
select a good k
3. Datamining Techniques
Data Analysis Classification
How can we identify the group of unclassified object ?
Sure! we can perform clustering to do this.
However, what if we know some of classified objects in
the past? Can we do better than Clustering? YES.
We can construct a prediction model to predict the group
of unclassified objects based on the classified objects
This process called CLASSIFICATION
3. Datamining Techniques
Data Analysis Classification
The process of Classification can be described as below
Learning
Algorithm
Model
3. Datamining Techniques
Data Analysis Classification - SVM
Support Vector Machine (SVM) is one of famous classification
method. It belongs to group of linear classifiers
For example: data classified in red and blue Training Data
𝑤 : normal vector
𝑏 : bias / distance from the line to origin
?
𝑥
𝑦 𝑤 + 𝑏 > 0 → blue
Classification Model?
𝑥
𝑦 𝑤 + 𝑏 < 0 → red
3. Datamining Techniques
Data Analysis Regression
Use for prediction: but to predict the missing value of an
attribute
For example:
Y
X𝑥𝑖
𝑦𝑖
• How to find 𝑦𝑖 , if 𝑥𝑖 known?
• We can estimate the line
that describe The data
• Plug 𝑥𝑖 to line equation to
Find 𝑦𝑖
• This is just an example of
Linear Regression
3. Datamining Techniques
Data Analysis Rule Base
Rule Base techniques : to find hidden patterns in the data
Example of rule base techniques:
• Customer normally buy rice always buy vegetable
• Young people want to more expensive phone than others
• People always buy laptop before buying cell-phone
Frequent Pattern
Gradual Pattern
Sequential Pattern
3. Datamining Techniques
Data Visualization
3. Datamining Techniques
Data Visualization
Techniques to present knowledge that you retrieved to user
0
2
4
6
8
10
12
14
Series 3
Series 2
Series 1
Series 1 Series 2 Series 3
Category
1 4.3 2.4 2
Category
2 2.5 4.4 2
Category
3 3.5 1.8 3
Category
4 4.5 2.8 5
Thank you for your attention

Contenu connexe

Tendances

Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introductionDr-Dipali Meher
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Houw Liong The
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data miningDevakumar Jain
 
An introduction to data mining and its techniques
An introduction to data mining and its techniquesAn introduction to data mining and its techniques
An introduction to data mining and its techniquesSandhya Tarwani
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data MiningValerii Klymchuk
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kambererror007
 
Data mining
Data mining Data mining
Data mining AthiraR23
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data miningEr. Nawaraj Bhandari
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CSThanveen
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Miningtobiemuir
 

Tendances (20)

Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Data mining
Data miningData mining
Data mining
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
An introduction to data mining and its techniques
An introduction to data mining and its techniquesAn introduction to data mining and its techniques
An introduction to data mining and its techniques
 
Data Mining
Data MiningData Mining
Data Mining
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
Data mining
Data mining Data mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CS
 
Data Mining
Data MiningData Mining
Data Mining
 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
 
Data mining
Data miningData mining
Data mining
 
3 Data Mining Tasks
3  Data Mining Tasks3  Data Mining Tasks
3 Data Mining Tasks
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
 

En vedette

Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Miningsnoreen
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 
Ethics In DW &amp; DM
Ethics In DW &amp; DMEthics In DW &amp; DM
Ethics In DW &amp; DMabethan
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
Digital footprints& datamining
Digital footprints& dataminingDigital footprints& datamining
Digital footprints& dataminingPaige Jaeger
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentationKaiwen Qi
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementationSandip Tipayle Patil
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseKartik Kalpande Patil
 
Application of data mining
Application of data miningApplication of data mining
Application of data miningSHIVANI SONI
 
Knowledge Discovery in Databases
Knowledge Discovery in DatabasesKnowledge Discovery in Databases
Knowledge Discovery in DatabasesDiwas Kandel
 
Weka presentation
Weka presentationWeka presentation
Weka presentationSaeed Iqbal
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagationKrish_ver2
 

En vedette (20)

Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Approaches to Mining Large-Scale Heterogeneous Data: Old and New
Approaches to Mining Large-Scale Heterogeneous Data: Old and NewApproaches to Mining Large-Scale Heterogeneous Data: Old and New
Approaches to Mining Large-Scale Heterogeneous Data: Old and New
 
Ethics In DW &amp; DM
Ethics In DW &amp; DMEthics In DW &amp; DM
Ethics In DW &amp; DM
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Datamining with R
Datamining with RDatamining with R
Datamining with R
 
Digital footprints& datamining
Digital footprints& dataminingDigital footprints& datamining
Digital footprints& datamining
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
Data mining project presentation
Data mining project presentationData mining project presentation
Data mining project presentation
 
Datamining
DataminingDatamining
Datamining
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
 
Application of data mining
Application of data miningApplication of data mining
Application of data mining
 
Knowledge Discovery in Databases
Knowledge Discovery in DatabasesKnowledge Discovery in Databases
Knowledge Discovery in Databases
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Weka presentation
Weka presentationWeka presentation
Weka presentation
 
Kdd process
Kdd processKdd process
Kdd process
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 

Similaire à Introduction to Datamining Concept and Techniques

Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-stepsShesha R
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptxssuser6654de1
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dmsumit621
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_TrushitaTrushita Redij
 
Singular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxSingular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxrajalakshmi5921
 
EDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxEDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxrajalakshmi5921
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniquesPoonam Kshirsagar
 
Intro to Data warehousing lecture 17
Intro to Data warehousing   lecture 17Intro to Data warehousing   lecture 17
Intro to Data warehousing lecture 17AnwarrChaudary
 
Data Reduction
Data ReductionData Reduction
Data ReductionRajan Shah
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data miningUjjawal
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxiamultapromax
 
Machine learning introduction
Machine learning introductionMachine learning introduction
Machine learning introductionAnas Jamil
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET Journal
 

Similaire à Introduction to Datamining Concept and Techniques (20)

Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptx
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_Trushita
 
Singular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxSingular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptx
 
EDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxEDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
 
Intro to Data warehousing lecture 17
Intro to Data warehousing   lecture 17Intro to Data warehousing   lecture 17
Intro to Data warehousing lecture 17
 
Data Reduction
Data ReductionData Reduction
Data Reduction
 
07 learning
07 learning07 learning
07 learning
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
 
Machine learning introduction
Machine learning introductionMachine learning introduction
Machine learning introduction
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
 
Data1
Data1Data1
Data1
 
Data1
Data1Data1
Data1
 
Data reduction
Data reductionData reduction
Data reduction
 
DATA MINING.pptx
DATA MINING.pptxDATA MINING.pptx
DATA MINING.pptx
 

Dernier

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 

Dernier (20)

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 

Introduction to Datamining Concept and Techniques

  • 1. Introduction to Datamining using Practical View Created : Ngô Tùng Sơn Part 1
  • 2. Schedule: 1. Example of Datamining 2. What and Where is Datamining in the System 3. Datamining Techniques  Data preprocessing  Data Analysis  Data Visualization
  • 3. How data look like? X Y 3 3 3 1 2 2 4 6 2 3 6 7 7 5 5 6 Can we get some thing from this? The row represents an object and its columns represent its attributes Ex: can we identify the group of these objects? YES 1. Example of Datamining
  • 4. Now, forget the table, consider a row as a point then we have 0 2 4 6 8 0 2 4 6 8 X Y B A C From each data point, we find its neighbors by scanning with a radius r . For Example : A will have 2 Neighbors B and C , denoted: A{B,C} r D A and D have same neighbors so they are considered as neighbors Same for B {A,B,C,D} ,C{A,B,C,D}, D{B,C} The points have neighborhood will be in the same group. 1. Example of Datamining
  • 5. Finally we have 2 groups after considering all points 0 2 4 6 8 0 2 4 6 8 X Y What do we see here? Data has not been classified into groups but we now have the groups This is just an example of technique called CLUSTERING in DATAMINING 1. Example of Datamining
  • 6. 2. What and Where is Datamining in the System So. What exactly is Datamining? Datamining is the set of tools and techniques to retrieve hidden Knowledge/Rules from data The name of datamining could make us to misunderstand Data was there, we do not need to ‘mining’ it For ore mining you need hammers and shovels  However, for datamining you need mathematic, statistic and probability, machine learning, computer programming, database techniques,...
  • 7. 2. What and Where is Datamining in the System Where is Datamining in the system? Employee/Staff Day by day, The staff using the software (Web/ Desktop/Mobile application) to generate data by recording all of his/her business activities (customers, products, order detail, contracts ,…) Database Data is added to Database Online transaction processing (OLTP) Database Database …. Data from several data sources (OLTP) will be collected to a common repository Data warehouse Integration Service Datamining service will access to the Data warehouse to process Data Mining
  • 8. 3. Datamining Techniques What are the techniques in Datamining? There are so many techniques can be applied in datamining Basically we can classify them into 3 groups / phases Data-Preprocessing Data Analysis Data Presentation
  • 10. 3. Datamining Techniques We can understand that: The quality of collected data would be not good. It is necessary to clean / format / transform .... Before analyzing This is very important process. It is very hard to find an abstract way to describe. Data-Preprocessing Here we will see few examples of data pre-processing techniques: • Similarity Measure • Down Sampling • Dimension Reduction • Vectorization
  • 11. 3. Datamining Techniques How can we know which object are similar? Data-Preprocessing Similarity Measure A(x1,y1) B(x2,y2) C(x1,y1) D2D1 Measure the distance between AB and AC We see that D1 < D2 -> A is more similar with B than C Every point can be represented as vector. Measure the angle between pair of vectors: A and B, then A and C We see that 𝜶 < 𝜷 -> A is more similar with B than C 𝜶 𝜷
  • 12. 3. Datamining Techniques What if, you have so many data, performing data analysis on all of them may be not necessary and reducing performance ? Data-Preprocessing Down Sampling Just pick some of them to evaluate Example: using a cell-size of 𝑔. Keep only object / cell 𝑔 𝑔 Origin Data Down Sampling
  • 13. 3. Datamining Techniques All example data have been presented to you are in 2 dimensions, 2 attributes (X,Y) . What if it was ~10.000 attributes for each object Data-Preprocessing Dimension Reduction This could reduce the performance (and or accuracy) of data- analysis algorithms . Somehow we need to reduce number of dimensions Principal component Analysis & Singular value Decomposition are 2 of most effective methods to do this
  • 14. 3. Datamining Techniques Data-Preprocessing Dimension Reduction - PCA PCA X Y 𝑃1 𝑃2 Origin Data Data projected to Principal Components We Only keep 𝑘 Principal Components that have highest eigenvalues. On above example. We can let 𝑘 = 1 then keep 𝑃1 instead of both 𝑃1 , 𝑃2 By this way the number of dimensions has been reduced
  • 15. 3. Datamining Techniques Data-Preprocessing Vectorization Most of Data Analysis algorithms consider the input as set of vectors, so we need to transform the collected data into set of vectors. Ex: Giving a document: “Mr A has not passed the exam this year. He will do it again next year” Some of important words will be extracted like “Mr A” , “not” , “pass” ,”exam” , “again” , “next” , “year” Measure the frequency of each word, we get the vector that represent the document Mr A not pass exam again next year 1 1 1 1 1 1 2
  • 17. 3. Datamining Techniques There are so many techniques in this phase: • Clustering • Classification • Regression • Rule Bases • …. This is the most important phase, where we find all of hidden knowledge/ rules in the data Data Analysis
  • 18. 3. Datamining Techniques The process of clustering is to find ways to group objects into groups (clusters) Data Analysis Clustering The objects in the same cluster are similar and otherwise they are not similar. There are 2 types of clustering : Partional & Hierarchical In this presentation: we see an example of the most famous clustering method : K-Mean
  • 19. 3. Datamining Techniques Data Analysis Clustering – K mean Algorithm 1. Randomly select K center (centroid) for K clusters (cluster). 2. Calculate the distance between objects (objects) to the K center 3. Group objects to the nearest group 4. Defining the new focus for the group 5. Repeat step 2 until no change of subject groups
  • 20. 3. Datamining Techniques Data Analysis Clustering – K mean Algorithm Consider the below data Plot them we have:
  • 21. 3. Datamining Techniques Data Analysis Clustering – K mean Algorithm Select K=2 centroids Compute the new position of centroids Finally centroids stop changing The object belongs to the group of its closest centroid The key point of algorithm is to select a good k
  • 22. 3. Datamining Techniques Data Analysis Classification How can we identify the group of unclassified object ? Sure! we can perform clustering to do this. However, what if we know some of classified objects in the past? Can we do better than Clustering? YES. We can construct a prediction model to predict the group of unclassified objects based on the classified objects This process called CLASSIFICATION
  • 23. 3. Datamining Techniques Data Analysis Classification The process of Classification can be described as below Learning Algorithm Model
  • 24. 3. Datamining Techniques Data Analysis Classification - SVM Support Vector Machine (SVM) is one of famous classification method. It belongs to group of linear classifiers For example: data classified in red and blue Training Data 𝑤 : normal vector 𝑏 : bias / distance from the line to origin ? 𝑥 𝑦 𝑤 + 𝑏 > 0 → blue Classification Model? 𝑥 𝑦 𝑤 + 𝑏 < 0 → red
  • 25. 3. Datamining Techniques Data Analysis Regression Use for prediction: but to predict the missing value of an attribute For example: Y X𝑥𝑖 𝑦𝑖 • How to find 𝑦𝑖 , if 𝑥𝑖 known? • We can estimate the line that describe The data • Plug 𝑥𝑖 to line equation to Find 𝑦𝑖 • This is just an example of Linear Regression
  • 26. 3. Datamining Techniques Data Analysis Rule Base Rule Base techniques : to find hidden patterns in the data Example of rule base techniques: • Customer normally buy rice always buy vegetable • Young people want to more expensive phone than others • People always buy laptop before buying cell-phone Frequent Pattern Gradual Pattern Sequential Pattern
  • 28. 3. Datamining Techniques Data Visualization Techniques to present knowledge that you retrieved to user 0 2 4 6 8 10 12 14 Series 3 Series 2 Series 1 Series 1 Series 2 Series 3 Category 1 4.3 2.4 2 Category 2 2.5 4.4 2 Category 3 3.5 1.8 3 Category 4 4.5 2.8 5
  • 29. Thank you for your attention