SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
K-NEAREST NEIGHBOR CLASSIFIER
Ajay Krishna Teja Kavuri
ajkavuri@mix.wvu.edu
OUTLINE
• BACKGROUND
• DEFINITION
• K-NN IN ACTION
• K-NN PROPERTIES
• REMARKS
BACKGROUND
“Classification is a data mining technique used to predict group
membership for data instances.”
• The group membership is utilized in for the prediction of the
future data sets.
ORIGINS OF K-NN
• Nearest Neighbors have been used in statistical estimation and
pattern recognition already in the beginning of 1970’s (non-
parametric techniques).
• The method prevailed in several disciplines and still it is one
of the top 10 Data Mining algorithm.
MOST CITED PAPERS
K-NN has several variations that came out of optimizations
through research. Following are most cited publications:
• Approximate nearest neighbors: towards removing the curse of dimensionality
Piotr Indyk, Rajeev Motwani
• Nearest neighbor queries
Nick Roussopoulos, Stephen Kelley, Frédéric Vincent
• Machine learning in automated text categorization
Fabrizio Sebastiani
IN A SENTENCE K-NN IS…..
• It’s how people judge by observing our peers.
• We tend to move with people of
similar attributes so does data.
DEFINITION
• K-Nearest Neighbor is considered a lazy learning algorithm
that classifies data sets based on their similarity with
neighbors.
• “K” stands for number of data set items
that are considered for the classification.
Ex: Image shows classification for different k-values.
TECHNICALLY…..
• For the given attributes A={X1, X2….. XD} Where D is the
dimension of the data, we need to predict the corresponding
classification group G={Y1,Y2…Yn} using the proximity
metric over K items in D dimension that defines the closeness
of association such that X € RD and Yp € G.
THAT IS….
• Attribute A={Color, Outline, Dot}
• Classification Group,
G={triangle, square}
• D=3, we are free to choose K value.
Attributes A
C
l
a
s
s
i
f
i
c
a
t
i
o
n
G
r
o
u
p
PROXIMITY METRIC
• Definition: Also termed as “Similarity Measure” quantifies the
association among different items.
• Following is a table of measures for different data items:
Similarity Measure Data Format
Contingency Table, Jaccard coefficient, Distance Measure Binary
Z-Score, Min-Max Normalization, Distance Measures Numeric
Cosine Similarity, Dot Product Vectors
PROXIMITY METRIC
• For the numeric data let us consider some distance measures:
– Manhattan Distance:
– Ex: Given X = {1,2} & Y = {2,5}
Manhattan Distance = dist(X,Y) = |1-2|+|2-5|
= 1+3
= 4
PROXIMITY METRIC
- Euclidean Distance:
- Ex: Given X = {-2,2} & Y = {2,5}
Euclidean Distance = dist(X,Y) = [ (-2-2)^2 + (2-5)^2 ]^(1/2)
= dist(X,Y) = (16 + 9)^(1/2)
= dist(X,Y) = 5
K-NN IN ACTION
• Consider the following data:
A={weight,color}
G={Apple(A), Banana(B)}
• We need to predict the type of a
fruit with:
weight = 378
color = red
SOME PROCESSING….
• Assign color codes to convert into numerical data:
• Let’s label Apple as “A” and
Banana as “B”
PLOTTING
• Using K=3,
Our result will be,
AS K VARIES….
• Clearly, K has an impact on the classification.
Can you guess?
K-NN LIVE!!
• http://www.ai.mit.edu/courses/6.034b/KNN.html
K-NN VARIATIONS
• Weighted K-NN: Takes the weights associated with each
attribute. This can give priority among attributes.
Ex: For the data,
Weight:
Probability:
Where,
Above is the resulting dataset
K-NN VARIATIONS
• (K-l)-NN: Reduce complexity by having a threshold on the
majority. We could restrict the associations through (K-l)-NN.
Ex: Decide if majority is over a given
threshold l. Otherwise reject.
Here, K=5 and l=4. As there is no
majority with count>4. We reject
to classify the element.
K-NN PROPERTIES
• K-NN is a lazy algorithm
• The processing defers with respect to K value.
• Result is generated after analysis of stored data.
• It neglects any intermediate values.
REMARKS: FIRST THE GOOD
Advantages
• Can be applied to the data from any distribution
for example, data does not have to be separable with a linear
boundary
• Very simple and intuitive
• Good classification if the number of samples is large enough
NOW THE BAD….
Disadvantages
• Dependent on K Value
• Test stage is computationally expensive
• No training stage, all the work is done during the test stage
• This is actually the opposite of what we want. Usually we can
afford training step to take a long time, but we want fast test step
• Need large number of samples for accuracy
THANK YOU

Contenu connexe

Tendances

Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...Simplilearn
 
K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERINGsingh7599
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentationRishavSharma112
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 
Outlier detection method introduction
Outlier detection method introductionOutlier detection method introduction
Outlier detection method introductionDaeJin Kim
 
Density based methods
Density based methodsDensity based methods
Density based methodsSVijaylakshmi
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithmVinit Dantkale
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methodsKrish_ver2
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based ClusteringSSA KPI
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Yan Xu
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Mustafa Sherazi
 

Tendances (20)

K means
K meansK means
K means
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
 
K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERING
 
KNN
KNNKNN
KNN
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Outlier detection method introduction
Outlier detection method introductionOutlier detection method introduction
Outlier detection method introduction
 
Decision tree
Decision treeDecision tree
Decision tree
 
Density based methods
Density based methodsDensity based methods
Density based methods
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithm
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
 

En vedette

k Nearest Neighbor
k Nearest Neighbork Nearest Neighbor
k Nearest Neighborbutest
 
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...Kato Mivule
 
Nearest Neighbor Algorithm Zaffar Ahmed
Nearest Neighbor Algorithm  Zaffar AhmedNearest Neighbor Algorithm  Zaffar Ahmed
Nearest Neighbor Algorithm Zaffar AhmedZaffar Ahmed Shaikh
 

En vedette (8)

k Nearest Neighbor
k Nearest Neighbork Nearest Neighbor
k Nearest Neighbor
 
Algorithme knn
Algorithme knnAlgorithme knn
Algorithme knn
 
Knn
KnnKnn
Knn
 
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
 
Machine learning clisification algorthims
Machine learning clisification algorthimsMachine learning clisification algorthims
Machine learning clisification algorthims
 
Knn
KnnKnn
Knn
 
Nearest Neighbor Algorithm Zaffar Ahmed
Nearest Neighbor Algorithm  Zaffar AhmedNearest Neighbor Algorithm  Zaffar Ahmed
Nearest Neighbor Algorithm Zaffar Ahmed
 
ML KNN-ALGORITHM
ML KNN-ALGORITHMML KNN-ALGORITHM
ML KNN-ALGORITHM
 

Similaire à KNN

KNN Algorithm using C++
KNN Algorithm using C++KNN Algorithm using C++
KNN Algorithm using C++Afraz Khan
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Maninda Edirisooriya
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
UNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data MiningUNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data MiningNandakumar P
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.pptvikassingh569137
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptxNIKHILGR3
 
Sampling and Data_Update.ppt
Sampling and Data_Update.pptSampling and Data_Update.ppt
Sampling and Data_Update.pptMdShohelRana69
 
Classification_Algorithms_Student_Data_Presentation
Classification_Algorithms_Student_Data_PresentationClassification_Algorithms_Student_Data_Presentation
Classification_Algorithms_Student_Data_PresentationMadeleine Organ
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdfbintis1
 
How Does Math Matter in Data Science
How Does Math Matter in Data ScienceHow Does Math Matter in Data Science
How Does Math Matter in Data ScienceMutia Ulfi
 
MachineLearning.pptx
MachineLearning.pptxMachineLearning.pptx
MachineLearning.pptxBangtangurl
 

Similaire à KNN (20)

KNN Algorithm using C++
KNN Algorithm using C++KNN Algorithm using C++
KNN Algorithm using C++
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
 
K nearest neighbours
K nearest neighboursK nearest neighbours
K nearest neighbours
 
Fa18_P2.pptx
Fa18_P2.pptxFa18_P2.pptx
Fa18_P2.pptx
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
UNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data MiningUNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data Mining
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
Machine Learning with R
Machine Learning with RMachine Learning with R
Machine Learning with R
 
KNN presentation.pdf
KNN presentation.pdfKNN presentation.pdf
KNN presentation.pdf
 
ML SFCSE.pptx
ML SFCSE.pptxML SFCSE.pptx
ML SFCSE.pptx
 
Mini_Project
Mini_ProjectMini_Project
Mini_Project
 
Clustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn TutorialClustering: A Scikit Learn Tutorial
Clustering: A Scikit Learn Tutorial
 
Sampling and Data_Update.ppt
Sampling and Data_Update.pptSampling and Data_Update.ppt
Sampling and Data_Update.ppt
 
K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)
 
Classification_Algorithms_Student_Data_Presentation
Classification_Algorithms_Student_Data_PresentationClassification_Algorithms_Student_Data_Presentation
Classification_Algorithms_Student_Data_Presentation
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdf
 
How Does Math Matter in Data Science
How Does Math Matter in Data ScienceHow Does Math Matter in Data Science
How Does Math Matter in Data Science
 
MachineLearning.pptx
MachineLearning.pptxMachineLearning.pptx
MachineLearning.pptx
 
DM_clustering.ppt
DM_clustering.pptDM_clustering.ppt
DM_clustering.ppt
 
KNN
KNNKNN
KNN
 

Dernier

4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxryandux83rd
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptxmary850239
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
DiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdfDiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdfChristalin Nelson
 
The role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenshipThe role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenshipKarl Donert
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...Nguyen Thanh Tu Collection
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 

Dernier (20)

4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptx
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
DiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdfDiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdf
 
The role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenshipThe role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenship
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Chi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical VariableChi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical Variable
 
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 

KNN

  • 1. K-NEAREST NEIGHBOR CLASSIFIER Ajay Krishna Teja Kavuri ajkavuri@mix.wvu.edu
  • 2. OUTLINE • BACKGROUND • DEFINITION • K-NN IN ACTION • K-NN PROPERTIES • REMARKS
  • 3. BACKGROUND “Classification is a data mining technique used to predict group membership for data instances.” • The group membership is utilized in for the prediction of the future data sets.
  • 4. ORIGINS OF K-NN • Nearest Neighbors have been used in statistical estimation and pattern recognition already in the beginning of 1970’s (non- parametric techniques). • The method prevailed in several disciplines and still it is one of the top 10 Data Mining algorithm.
  • 5. MOST CITED PAPERS K-NN has several variations that came out of optimizations through research. Following are most cited publications: • Approximate nearest neighbors: towards removing the curse of dimensionality Piotr Indyk, Rajeev Motwani • Nearest neighbor queries Nick Roussopoulos, Stephen Kelley, Frédéric Vincent • Machine learning in automated text categorization Fabrizio Sebastiani
  • 6. IN A SENTENCE K-NN IS….. • It’s how people judge by observing our peers. • We tend to move with people of similar attributes so does data.
  • 7. DEFINITION • K-Nearest Neighbor is considered a lazy learning algorithm that classifies data sets based on their similarity with neighbors. • “K” stands for number of data set items that are considered for the classification. Ex: Image shows classification for different k-values.
  • 8. TECHNICALLY….. • For the given attributes A={X1, X2….. XD} Where D is the dimension of the data, we need to predict the corresponding classification group G={Y1,Y2…Yn} using the proximity metric over K items in D dimension that defines the closeness of association such that X € RD and Yp € G.
  • 9. THAT IS…. • Attribute A={Color, Outline, Dot} • Classification Group, G={triangle, square} • D=3, we are free to choose K value. Attributes A C l a s s i f i c a t i o n G r o u p
  • 10. PROXIMITY METRIC • Definition: Also termed as “Similarity Measure” quantifies the association among different items. • Following is a table of measures for different data items: Similarity Measure Data Format Contingency Table, Jaccard coefficient, Distance Measure Binary Z-Score, Min-Max Normalization, Distance Measures Numeric Cosine Similarity, Dot Product Vectors
  • 11. PROXIMITY METRIC • For the numeric data let us consider some distance measures: – Manhattan Distance: – Ex: Given X = {1,2} & Y = {2,5} Manhattan Distance = dist(X,Y) = |1-2|+|2-5| = 1+3 = 4
  • 12. PROXIMITY METRIC - Euclidean Distance: - Ex: Given X = {-2,2} & Y = {2,5} Euclidean Distance = dist(X,Y) = [ (-2-2)^2 + (2-5)^2 ]^(1/2) = dist(X,Y) = (16 + 9)^(1/2) = dist(X,Y) = 5
  • 13. K-NN IN ACTION • Consider the following data: A={weight,color} G={Apple(A), Banana(B)} • We need to predict the type of a fruit with: weight = 378 color = red
  • 14. SOME PROCESSING…. • Assign color codes to convert into numerical data: • Let’s label Apple as “A” and Banana as “B”
  • 15. PLOTTING • Using K=3, Our result will be,
  • 16. AS K VARIES…. • Clearly, K has an impact on the classification. Can you guess?
  • 18. K-NN VARIATIONS • Weighted K-NN: Takes the weights associated with each attribute. This can give priority among attributes. Ex: For the data, Weight: Probability: Where, Above is the resulting dataset
  • 19. K-NN VARIATIONS • (K-l)-NN: Reduce complexity by having a threshold on the majority. We could restrict the associations through (K-l)-NN. Ex: Decide if majority is over a given threshold l. Otherwise reject. Here, K=5 and l=4. As there is no majority with count>4. We reject to classify the element.
  • 20. K-NN PROPERTIES • K-NN is a lazy algorithm • The processing defers with respect to K value. • Result is generated after analysis of stored data. • It neglects any intermediate values.
  • 21. REMARKS: FIRST THE GOOD Advantages • Can be applied to the data from any distribution for example, data does not have to be separable with a linear boundary • Very simple and intuitive • Good classification if the number of samples is large enough
  • 22. NOW THE BAD…. Disadvantages • Dependent on K Value • Test stage is computationally expensive • No training stage, all the work is done during the test stage • This is actually the opposite of what we want. Usually we can afford training step to take a long time, but we want fast test step • Need large number of samples for accuracy