12. PROC ACECLUS output from Poverty Data set (p=3) : QQ-PLOTS to check MVN on transformed variables (can1, can2, can3) which is needed for Ward’s method. Rq(can1)=0.951, Rq(can2)=0.981, Rq(can3)=0.976, where n=97 and RqCP=0.9895 at α =0.1 A more thorough investigation would involve outlier detection and removal as well as data transform testing (BOX-COX)
13. Minimal code needed for a cluster analysis Generate a data set with only the resulting clustering # we wish to examine for use in PLOTTING, if needed Sampling proportion: try values from 0.01 to 0.5
14. PROC TREE output: how many clusters do we think are appropriate? (Distance criteria and value at time of merger on horizontal axis) Ward’s ? Average
18. Comparison of CCC, Pseudo-F, Pseudo-T2 under different clustering runs varying distance, linkage and normalization If we didn’t have a low dimensional variable set (p=3) it would be impossible to build a case on AVERAGE- and SIMPLE linkage Euclidian Dist, AVG linkage, Aceclus Normalized ? Ward Linkage, Aceclus Normalized What we want to see. Simple Linkage, Aceclus Normalized ?
19. Birth Rate vs Death Rate Notice the evidence for the known bias in Ward to equal numbers of observations per cluster where as with AVG the process allows us to have some small clusters in the lower right. The Expected Maximum Likelihood (EML) method in PROC CLUSTER produces similar results to Ward’s method, but with a slight bias in the opposite direction toward clusters of unequal sizes. Ward linkage, ACECLUS norm Euclidian dist, AVG linkage, ACECLUS norm
24. Here’s an example of the risk of “bad” Hierarchical Agglomerative clustering early on: small run on 8 items shows us divergence in cluster membership. If the final cluster number were 4, then we’d have different results from these two runs. Which would be best? Slight difference in clustering with a robust approach but bad approaches can result in significant differences that will not be undone as Hierarchical Agglomerative clustering proceeds.
25. MVN and outlier sensitivity of Ward’s linkage: Test on a small 4 item sample to show the effect of clustering with ACECLUS normalization (left) and NO normalization (right) under Ward’s linkage method: clustering is somewhat different.
26. Method = WARD in PROC CLUSTER (P692-693, Dean & Wichern) in Proc Cluster
27.
28.
29. We need a stopping criteria: what is the best number of clusters to use? Don’t want too few &/or a RISE in SPRSQ Large jump in SPRSQ Small increase in SPRSQ Intermediate increase in SPRSQ
30. How to interpret the Proc Cluster RAW Output: cluster NAME and PARENT cluster columns can be interpreted as noted below… Bulgaria+Czechoslovakia C3 FormerEGermany+C3 C2 Albania+C2 C1
31.
32. How to interpret the Proc Tree RAW output: focus on CLUSTER & CLUSTERNAME Cluster 1 event forms CL3, Cluster 2 event adds FEG, Cluster 3 event adds Albania
33. Prior to clustering we’ll use PROC ACECLUS to generate normalized variables: Can1~BirthRate, Can2~DeathRate, Can3~InfantDeathRate
34.
35.
36. Minkowski Distance m=1, sum of absolute values, or “City Block” distance m=2, sum of squares, or Euclidian distance