2. What is Cluster Analysis?
It is a descriptive analysis technique which groups
objects (respondents, products, firms, variables,
etc.) so that each object is similar to the other
objects in the cluster and different from objects in
all the other clusters.
2
3. What is Cluster Analysis?
Cluster: a collection of data objects
Similar to one another within the same cluster
Dissimilar to the objects in other clusters
Cluster analysis
Finding similarities between data according to the
characteristics found in the data and grouping
similar data objects into clusters
4. When to use cluster analysis?
The essence of all clustering approaches is the classification of
data as suggested by “natural” groupings of the data themselves.
Simply put when you desire the following then use
Cluster analysis.
Taxonomy development(segmentation)
Data simplification
Relationship identification
Applications.
It is used to segment the market in Marketing, used in
social networking sites in making new groups based on
users data, Flickr’s map of photos and other map sites
use clustering to reduce the number of markers on a
map.
4
5. Examples of Clustering Applications
• Marketing: Help marketers discover distinct groups in their
customer bases, and then use this knowledge to develop
targeted marketing programs.
• Land use: Identification of areas of similar land use in an
earth observation database.
• Insurance: Identifying groups of motor insurance policy
holders with a high average claim cost.
• City-planning: Identifying groups of houses according to
their house type, value, and geographical location.
• Earth-quake studies: Observed earth quake epicenters
should be clustered along continent faults
6. Assumptions for Cluster Analysis.
Sufficient size is needed to ensure representativeness of
the population and its underlying structure, particularly
small groups within the population.
Outliers can severely distort the representativeness of the
results if they appear as structure (clusters) that are
inconsistent with the research objectives
Representativeness of the sample. The sample must
represent the research question.
Impact of multicollinearity. Input variables should be
examined for substantial multicollinearity and if present:
Reduce the variables to equal numbers in each set of
correlated measures.
6