2. What is Cluster & Cluster Analysis?
• Cluster is a group of similar objects (cases, points, observations,
members, customers, patients, locations, etc).
• Cluster Analysis is a set of data-driven partitioning techniques designed
to group a collection of objects into clusters, such that :-
in the number of groups (clusters) the degree of association or similarity
is strong between members of the same cluster or weak between
members of different clusters.
What Cluster analysis means in terms of “Marketing Research”
Grouping similar customers and products is a fundamental marketing
concept. It is used, for example, in market segmentation .
As companies cannot connect with all their customers, they have to divide
markets into groups of consumers, customers, or clients (called
segments) with similar needs and wants. Each of these segments can
then be targeted by firms who can position themselves in a unique
segment.
Ex: Ferrari in high end sport car market and Alto in middle/affordable class
car market.
3. Uses of Cluster analysis in marketing
• Data Reduction:
Reduction of information from
entire population by reducing
characteristics of representative
groups with minimal loss of
information. It is important to
select types of variables for
clustering and effect of such
variables on research.
• Development of potential new
opportunities for products:
The company can determined the
extent to which a potential new
product or is uniquely positioned
within the completive sets of
other products.
Ex: Apple iMac and Iphones
• Understanding the consumer behavior in market:
To identify homogenous group of customers, so called market segmentation.
Segmentation in terms of demographic, age, financial and other
characteristics.
Ex: Identifying attributes in choosing banks and segmenting banks.
Ref: Tuma, M.N., Scholz, S.W., Decker, R. (2009.). The Application of Cluster Analysis in
Marketing Research: A Literature Analysis.B>Quest. University of West Georgia
4. Conducting a cluster analysis..
Formulate the problem
Select a distance measure
Select a clustering procedure
Decide on no. of clusters
Interpret and profile clusters
Access the reliability & validity
5. 1. Formulate a problem
Perhaps the most important part of formulating the clustering problem is
selecting the variables on which the clustering is based. Inclusion of even
one or two irrelevant variables may distort an otherwise useful clustering
solution.
Set of variables selected should describe the similarity between objects in
terms that are relevant to the marketing research problem.
Ex: Clustering of consumers based on attitudes towards drinking &
shopping. Scale (1 = disagree, 7 = agree):
V1 Drinking if fun
V2 Drinking is bad for your health.
V3 I combine drinking with eating out.
V4 I prefer drinking in parties.
V5 I prefer drinking high end brands .
V5 I don’t care about drinking.
V1 Shopping is fun.
V2 Shopping is bad for your budget.
V3 I combine shopping with eating out.
V4 I try to get the best buys while
shopping.
V5 I don’t care about shopping.
V6 You can save a lot of money by
comparing prices.
6. 1. Formulate a problem
V1 Shopping is fun.
V2 Shopping is bad for your budget.
V3 I combine shopping with eating out.
V4 I try to get the best buys while shopping.
V5 I don’t care about shopping.
V6 You can save a lot of money by comparing prices.
Case No. V1 V2 V3 V4 V5 V6
1 6 4 7 3 2 3
2 2 3 1 4 5 4
3 7 2 6 4 1 3
4 4 6 4 5 3 6
5 1 2 2 2 6 4
6 6 4 6 3 3 4
7 5 3 6 3 3 4
7. 2. Select a distance measure
Objective of clustering is to group similar objects together, some measure is
needed to assess how similar or different the objects are.
The most common approach is to measure similarity in terms of distance
between pairs of objects. Objects with smaller distances between them are
more similar to each other than are those at larger distances.
Single variable, similarity is straightforward
•Example: income – two individuals are similar if their income level is
similar and the level of dissimilarity increases as the income gap
increases
Multiple variables require an aggregate distance measure
•Many characteristics (e.g. income, age, consumption habits, family
composition, owning a car, education level), it becomes more difficult
to define similarity with a single value
The most known measure of distance is the Euclidean distance.
The Euclidean distance is the square root of the sum of the squared
differences in values for each variable
n
i
piqiDij
1
)( 2
8. 3. Select a clustering procedure
Clustering procedures
Hierarchical Non- Hierarchical
Agglomerative Divisive
Sequential
threshold
Parallel
threshold
Optimizin
g threshold
Linkage
methods
Variance
methods
Centroid
methods
Ward’ method
Single linkage Complete linkage Average linkage
13. 4. Decide on no. of clusters
A major issue in cluster analysis is deciding on the number of clusters.
Although there are no hard and fast rules, some guidelines are available.
1 Theoretical, practical considerations may suggest a certain number of
clusters. For example, if the purpose of clustering is to identify market
segments, management may want a particular number of clusters.
2 In hierarchical clustering, the distances at which clusters are
combined can be used as criteria. Using dendogram.
14. 5. Interpret and profile clusters
Means of variables
Cluster No. V1 V2 V3 V4 V5 V6
1 5.750 3.625 6.000 3.125 1.750 3.875
2 1.667 3.000 1.833 3.500 5.500 3.333
3 3.500 5.833 3.333 6.000 3.500 6.000
V1 Shopping is fun.
V2 Shopping is bad for your budget.
V3 I combine shopping with eating out.
V4 I try to get the best buys while shopping.
V5 I don’t care about shopping.
V6 You can save a lot of money by comparing prices.
15. Application of clustering in real world
• Marketing: Help marketers discover distinct groups in their customer
bases, and then use this knowledge to develop targeted marketing programs
• Land use: Identification of areas of similar land use in an earth
observation database
• Insurance: Identifying groups of motor insurance policy holders with a
high average claim cost
• City-planning: Identifying groups of houses according to their house type,
value, and geographical location
• Earth-quake studies: Observed earth quake epicenters should be clustered
along continent faults