SlideShare une entreprise Scribd logo
1  sur  17
Cluster Analyis

    Anindita
Cluster analysis
The class of technique used to classify objects or
  cases into relatively homogenous groups
  called clusters. Also known as classification
  analysis or numerical taxonomy.

Example: Clustering of variables on the
   variables like quality consciousness(var1) and
   Price sensitivity(var2)
It requires no prior information about sample
Uses of Cluster Analysis
• Segmenting the market(benefits soughts)
• Understanding Buyer behavior
• Assess new product opportunities(brands or
  markets)
• Selecting test markets(grouping cities)
• Effort to reduce clusters
Steps
• Formulation of problem: Selecting relevant
  variables on interval scale.
• Select a distance measure: how close or
  different objects are?
Euclidean Distance
• Select clustering procedure
• Interpret or profiling clusters
• Assess reliability of clustering
Types
• Hierarchical
a)Agglomerative
   1. Linkage(single, complete and average)
   2. Variance( ward’s)
a)Divisive
• Non- Hierarchichal(k-means)
Steps in SPSS
1. ANALYZE from SPSS
2. Click CLASSIFY and then HIERARCHICAL
   CLUSTER
3. Move the VARIABLES into VARIABLE box
4. In Cluster check CASES. In DISPLAY Box check
   STATISTICS and PLOTS
5. Click on statistics. In pop up window check
   agglomeration schedule. In cluster
   membership
Hierarchical clustering
Agglomeration Schedule
• “Stage” with 19 clusters
• Respondents 14 & 16 are combined “ Clusters
  combined”
• Euclidean distance betwn two respondents
  “Cofficients”
• “Stage cluster first appears” indicates the stage at
  which first cluster is formed. Entry of 1 in stage 6,
  respondent 14 was first grouped in stage 1
• “Next Stage” the stage at which another cluster is
  combined with this one. Number is 6 so at the stage 6,
  10 and 14 combined to form a single cluster
Icicle plot
• Columns corresponds to objects being clustered, 1
  through 20.
• Row corresponds to number of clusters
• Figure is read from bottom to top
• First all cases are considered, last row 20 initial clusters
• First step, two closest objects are combined resulting
  in 19 clusters, 14 and 16 are combined, X’s
• Row 18 corresponds, 18 clusters, 6 and 7 are
  combined. Here 16 are individual, two contains two
  respondents.
• Each step leads to a new cluster
Dendogram
• Read fro left to right
• Vertical lines represent clusters that r joined
  together.
• Position of line represents the distance at
  which clusters were joined
• Initially its less different as distances increase
  it becomes clear.
Deciding the Clusters
• Practical , theoretical or conceptual
  considerations while deciding number of
  clusters
• In hierarchical clustering, the distances at
  which clusters are formed are a criteria. In
  “coefficients” column suddenly more than
  doubles between stages 17 (three clusters)
  and 18 (clusters). That can be seen in last two
  stages of dendogram.
Interpret and profiling the clusters
• Cluster 1 : High values variables V1(shopping is fun) and V 3(I
  combine shopping with eating out). It has a low value for V5( I
  don’t care about shopping). Cluster 1 can be labeled as “fun
  loving and concerned shoppers”. This consists of respondents
  or cases 1,3, 6,7,8,12,15 and 17.
• Cluster 2 is just opposite with low values on V1 and V3 and
  high values V5 so it can be labeled as “Apathetic shoppers”. It
  consists of cases 2,5, 9, 11, 13 and 20.
• Cluster 3 has high values of V2(shopping upsets budget, V4(I
  try to get best buys) and V6( comparing saves money) so they
  can be labeled as economical shoppers. It consists of cases 4,
  10,14, 16, 18 and 19.
Non Hierarchical Clustering
• The Initial Cluster center are the values of three
  randomly selected cases. Each case is assigned to
  nearest classification cluster center
• The results also displays the cluster membership and
  the distance between each case and its classification
  center
• Cluster 1 of hierarchical clustering is same sa cluster
  3 of non hieararchical clustering
• Cluster 3 of hierarchical clustering is same as cluster
  1 of non hierarchical clustering
• The distance between the final cluster centers
  indicated that the pair of clusters are well
  seperated
• Univarite F test for each clustering variable is
  presented. It is only desriptive
Two Step clustering
• AIC is at minimum (97.594) for a three cluster
  solution. A comparison of cluster centroids
  show that cluster 1(two step cluster)
  corresponds to cluster 2 (hierarchical). Cluster
  2(two step cluster) corresponds to cluster
  3(hierarchical) .
• The results are same ensures validity of
  clustering

Contenu connexe

Tendances

General Statistics boa
General Statistics boaGeneral Statistics boa
General Statistics boa
raileeanne
 
Data presentation
Data presentationData presentation
Data presentation
MaiBabes17
 
Statistics
StatisticsStatistics
Statistics
itutor
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of data
drasifk
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of data
prince irfan
 

Tendances (19)

General Statistics boa
General Statistics boaGeneral Statistics boa
General Statistics boa
 
Data presentation
Data presentationData presentation
Data presentation
 
Statistics
StatisticsStatistics
Statistics
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of data
 
Basic concepts of statistics
Basic concepts of statistics Basic concepts of statistics
Basic concepts of statistics
 
Data presentation/ How to present Research outcome data
Data presentation/ How to present Research outcome dataData presentation/ How to present Research outcome data
Data presentation/ How to present Research outcome data
 
Graphical Representation of Data
Graphical Representation of DataGraphical Representation of Data
Graphical Representation of Data
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of data
 
#1 Introduction to statistics
#1 Introduction to statistics#1 Introduction to statistics
#1 Introduction to statistics
 
Presentation by Ali Asghar jatoi Roll No O11 of Statistics ( Presentation of ...
Presentation by Ali Asghar jatoi Roll No O11 of Statistics ( Presentation of ...Presentation by Ali Asghar jatoi Roll No O11 of Statistics ( Presentation of ...
Presentation by Ali Asghar jatoi Roll No O11 of Statistics ( Presentation of ...
 
QT1 - 02 - Frequency Distribution
QT1 - 02 - Frequency DistributionQT1 - 02 - Frequency Distribution
QT1 - 02 - Frequency Distribution
 
Tabular and Graphical Presentation of Data
Tabular and Graphical Presentation of DataTabular and Graphical Presentation of Data
Tabular and Graphical Presentation of Data
 
Descriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDescriptive Statistics and Data Visualization
Descriptive Statistics and Data Visualization
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Data presentation 2
Data presentation 2Data presentation 2
Data presentation 2
 
Classification & tabulation of data
Classification & tabulation of dataClassification & tabulation of data
Classification & tabulation of data
 
presentation of data
presentation of datapresentation of data
presentation of data
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Frequency distribution
Frequency distributionFrequency distribution
Frequency distribution
 

En vedette (11)

Mca circulars rationale and implications
Mca circulars rationale and implicationsMca circulars rationale and implications
Mca circulars rationale and implications
 
Photoshop tutorials1
Photoshop tutorials1Photoshop tutorials1
Photoshop tutorials1
 
depebi proses anaerob
depebi proses anaerobdepebi proses anaerob
depebi proses anaerob
 
Project_702
Project_702Project_702
Project_702
 
An Introduction to Agglomeration
An Introduction to AgglomerationAn Introduction to Agglomeration
An Introduction to Agglomeration
 
Malhotra20
Malhotra20Malhotra20
Malhotra20
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 
Cluster spss week7
Cluster spss week7Cluster spss week7
Cluster spss week7
 
Cluster analysis for market segmentation
Cluster analysis for market segmentationCluster analysis for market segmentation
Cluster analysis for market segmentation
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 

Similaire à Cluster

Scaling techniques (unit iv) shradha
Scaling techniques (unit iv)   shradhaScaling techniques (unit iv)   shradha
Scaling techniques (unit iv) shradha
Shilpi Vaishkiyar
 
cluster analysis(1).pptxbfdhdhhthjhfghhj
cluster analysis(1).pptxbfdhdhhthjhfghhjcluster analysis(1).pptxbfdhdhhthjhfghhj
cluster analysis(1).pptxbfdhdhhthjhfghhj
KaranSingh784447
 
Clusteranalysis 121206234137-phpapp01
Clusteranalysis 121206234137-phpapp01Clusteranalysis 121206234137-phpapp01
Clusteranalysis 121206234137-phpapp01
deepti gupta
 

Similaire à Cluster (20)

12. Cluster Analysis_19_3_21.pptx cluster
12. Cluster Analysis_19_3_21.pptx cluster12. Cluster Analysis_19_3_21.pptx cluster
12. Cluster Analysis_19_3_21.pptx cluster
 
01 Statistika Lanjut - Cluster Analysis part 1 with sound (1).pptx
01 Statistika Lanjut - Cluster Analysis  part 1 with sound (1).pptx01 Statistika Lanjut - Cluster Analysis  part 1 with sound (1).pptx
01 Statistika Lanjut - Cluster Analysis part 1 with sound (1).pptx
 
Unsupervised Learning-Clustering Algorithms.pptx
Unsupervised Learning-Clustering Algorithms.pptxUnsupervised Learning-Clustering Algorithms.pptx
Unsupervised Learning-Clustering Algorithms.pptx
 
Cluster Analysis
Cluster Analysis Cluster Analysis
Cluster Analysis
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
CLuster analysis presentation.pptx
CLuster analysis presentation.pptxCLuster analysis presentation.pptx
CLuster analysis presentation.pptx
 
4646150.ppt
4646150.ppt4646150.ppt
4646150.ppt
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
Clustering on DSS
Clustering on DSSClustering on DSS
Clustering on DSS
 
Cluster_saumitra.ppt
Cluster_saumitra.pptCluster_saumitra.ppt
Cluster_saumitra.ppt
 
Sequence alignment unit 3
Sequence alignment unit 3Sequence alignment unit 3
Sequence alignment unit 3
 
Clusteryanam
ClusteryanamClusteryanam
Clusteryanam
 
Cluster Validation
Cluster ValidationCluster Validation
Cluster Validation
 
Scaling techniques (unit iv) shradha
Scaling techniques (unit iv)   shradhaScaling techniques (unit iv)   shradha
Scaling techniques (unit iv) shradha
 
Spss tutorial-cluster-analysis
Spss tutorial-cluster-analysisSpss tutorial-cluster-analysis
Spss tutorial-cluster-analysis
 
cluster analysis(1).pptxbfdhdhhthjhfghhj
cluster analysis(1).pptxbfdhdhhthjhfghhjcluster analysis(1).pptxbfdhdhhthjhfghhj
cluster analysis(1).pptxbfdhdhhthjhfghhj
 
Statistics
StatisticsStatistics
Statistics
 
Read first few slides cluster analysis
Read first few slides cluster analysisRead first few slides cluster analysis
Read first few slides cluster analysis
 
Clusteranalysis
Clusteranalysis Clusteranalysis
Clusteranalysis
 
Clusteranalysis 121206234137-phpapp01
Clusteranalysis 121206234137-phpapp01Clusteranalysis 121206234137-phpapp01
Clusteranalysis 121206234137-phpapp01
 

Plus de H9460730008

Variables, theoretical framework and hypotheses
Variables, theoretical framework and hypothesesVariables, theoretical framework and hypotheses
Variables, theoretical framework and hypotheses
H9460730008
 
Problem formulation
Problem formulationProblem formulation
Problem formulation
H9460730008
 
Multidimensional scaling
Multidimensional scalingMultidimensional scaling
Multidimensional scaling
H9460730008
 
Measurement of variable& scaling
Measurement of variable& scalingMeasurement of variable& scaling
Measurement of variable& scaling
H9460730008
 
Measurement of variable& scaling (2)
Measurement of variable& scaling (2)Measurement of variable& scaling (2)
Measurement of variable& scaling (2)
H9460730008
 
Experimental design
Experimental designExperimental design
Experimental design
H9460730008
 
Exercise problem formulation
Exercise problem formulationExercise problem formulation
Exercise problem formulation
H9460730008
 
Pki enabling applications and mca implementation in tcs
Pki enabling applications and mca implementation in tcsPki enabling applications and mca implementation in tcs
Pki enabling applications and mca implementation in tcs
H9460730008
 
Lesson from movie lakshya
Lesson from movie lakshyaLesson from movie lakshya
Lesson from movie lakshya
H9460730008
 
Lesson from movie lagaan
Lesson from movie lagaanLesson from movie lagaan
Lesson from movie lagaan
H9460730008
 
Brandextensionsppt0111 100120101005-phpapp02
Brandextensionsppt0111 100120101005-phpapp02Brandextensionsppt0111 100120101005-phpapp02
Brandextensionsppt0111 100120101005-phpapp02
H9460730008
 

Plus de H9460730008 (20)

Variables, theoretical framework and hypotheses
Variables, theoretical framework and hypothesesVariables, theoretical framework and hypotheses
Variables, theoretical framework and hypotheses
 
Sampling
SamplingSampling
Sampling
 
Research design
Research designResearch design
Research design
 
Problem formulation
Problem formulationProblem formulation
Problem formulation
 
Multidimensional scaling
Multidimensional scalingMultidimensional scaling
Multidimensional scaling
 
Measurement of variable& scaling
Measurement of variable& scalingMeasurement of variable& scaling
Measurement of variable& scaling
 
Measurement of variable& scaling (2)
Measurement of variable& scaling (2)Measurement of variable& scaling (2)
Measurement of variable& scaling (2)
 
Literature.ppt
Literature.pptLiterature.ppt
Literature.ppt
 
Literature
LiteratureLiterature
Literature
 
Experimental design
Experimental designExperimental design
Experimental design
 
Exercise problem formulation
Exercise problem formulationExercise problem formulation
Exercise problem formulation
 
Data collection
Data collectionData collection
Data collection
 
Data analysis
Data analysisData analysis
Data analysis
 
Budgeting
BudgetingBudgeting
Budgeting
 
Bayesian
BayesianBayesian
Bayesian
 
Research intro
Research introResearch intro
Research intro
 
Pki enabling applications and mca implementation in tcs
Pki enabling applications and mca implementation in tcsPki enabling applications and mca implementation in tcs
Pki enabling applications and mca implementation in tcs
 
Lesson from movie lakshya
Lesson from movie lakshyaLesson from movie lakshya
Lesson from movie lakshya
 
Lesson from movie lagaan
Lesson from movie lagaanLesson from movie lagaan
Lesson from movie lagaan
 
Brandextensionsppt0111 100120101005-phpapp02
Brandextensionsppt0111 100120101005-phpapp02Brandextensionsppt0111 100120101005-phpapp02
Brandextensionsppt0111 100120101005-phpapp02
 

Cluster

  • 1. Cluster Analyis Anindita
  • 2. Cluster analysis The class of technique used to classify objects or cases into relatively homogenous groups called clusters. Also known as classification analysis or numerical taxonomy. Example: Clustering of variables on the variables like quality consciousness(var1) and Price sensitivity(var2) It requires no prior information about sample
  • 3. Uses of Cluster Analysis • Segmenting the market(benefits soughts) • Understanding Buyer behavior • Assess new product opportunities(brands or markets) • Selecting test markets(grouping cities) • Effort to reduce clusters
  • 4. Steps • Formulation of problem: Selecting relevant variables on interval scale. • Select a distance measure: how close or different objects are? Euclidean Distance • Select clustering procedure • Interpret or profiling clusters • Assess reliability of clustering
  • 5. Types • Hierarchical a)Agglomerative 1. Linkage(single, complete and average) 2. Variance( ward’s) a)Divisive • Non- Hierarchichal(k-means)
  • 6. Steps in SPSS 1. ANALYZE from SPSS 2. Click CLASSIFY and then HIERARCHICAL CLUSTER 3. Move the VARIABLES into VARIABLE box 4. In Cluster check CASES. In DISPLAY Box check STATISTICS and PLOTS 5. Click on statistics. In pop up window check agglomeration schedule. In cluster membership
  • 8. Agglomeration Schedule • “Stage” with 19 clusters • Respondents 14 & 16 are combined “ Clusters combined” • Euclidean distance betwn two respondents “Cofficients” • “Stage cluster first appears” indicates the stage at which first cluster is formed. Entry of 1 in stage 6, respondent 14 was first grouped in stage 1 • “Next Stage” the stage at which another cluster is combined with this one. Number is 6 so at the stage 6, 10 and 14 combined to form a single cluster
  • 9. Icicle plot • Columns corresponds to objects being clustered, 1 through 20. • Row corresponds to number of clusters • Figure is read from bottom to top • First all cases are considered, last row 20 initial clusters • First step, two closest objects are combined resulting in 19 clusters, 14 and 16 are combined, X’s • Row 18 corresponds, 18 clusters, 6 and 7 are combined. Here 16 are individual, two contains two respondents. • Each step leads to a new cluster
  • 10. Dendogram • Read fro left to right • Vertical lines represent clusters that r joined together. • Position of line represents the distance at which clusters were joined • Initially its less different as distances increase it becomes clear.
  • 11. Deciding the Clusters • Practical , theoretical or conceptual considerations while deciding number of clusters • In hierarchical clustering, the distances at which clusters are formed are a criteria. In “coefficients” column suddenly more than doubles between stages 17 (three clusters) and 18 (clusters). That can be seen in last two stages of dendogram.
  • 12. Interpret and profiling the clusters • Cluster 1 : High values variables V1(shopping is fun) and V 3(I combine shopping with eating out). It has a low value for V5( I don’t care about shopping). Cluster 1 can be labeled as “fun loving and concerned shoppers”. This consists of respondents or cases 1,3, 6,7,8,12,15 and 17. • Cluster 2 is just opposite with low values on V1 and V3 and high values V5 so it can be labeled as “Apathetic shoppers”. It consists of cases 2,5, 9, 11, 13 and 20. • Cluster 3 has high values of V2(shopping upsets budget, V4(I try to get best buys) and V6( comparing saves money) so they can be labeled as economical shoppers. It consists of cases 4, 10,14, 16, 18 and 19.
  • 14. • The Initial Cluster center are the values of three randomly selected cases. Each case is assigned to nearest classification cluster center • The results also displays the cluster membership and the distance between each case and its classification center • Cluster 1 of hierarchical clustering is same sa cluster 3 of non hieararchical clustering • Cluster 3 of hierarchical clustering is same as cluster 1 of non hierarchical clustering
  • 15. • The distance between the final cluster centers indicated that the pair of clusters are well seperated • Univarite F test for each clustering variable is presented. It is only desriptive
  • 17. • AIC is at minimum (97.594) for a three cluster solution. A comparison of cluster centroids show that cluster 1(two step cluster) corresponds to cluster 2 (hierarchical). Cluster 2(two step cluster) corresponds to cluster 3(hierarchical) . • The results are same ensures validity of clustering