Sunday, 28 August 2011

Cluster analysis

Cluster analysis classifies a set of observations into two or more mutually exclusive unknown groups based on combinations of interval variables. The purpose of cluster analysis is to discover a system of organizing observations, usually people, into groups, where members of the groups share properties in common.

We can use Factor Analysis to group variables according to shared variance. In factor analysis, we take several variables, examine how much variance these variables share, and how much is unique and then ‘cluster’ variables together that share the same variables. In short, we cluster together variables that look as though they explain the same variance.

In essence, cluster analysis is a similar technique except that rather than trying to group together variables; we are interested in grouping cases. Usually, in psychology at any rate, this means that we are interested in clustering groups of people. So, in a sense it’s the opposite of factor analysis: instead of forming groups of variables based on several people’s responses to those variables, we instead group people based on their responses to several variables.


Clustering for understanding- Classes or conceptually meaningful groups of objects that share common characteristics, play an important role in how people analyze and describe the world. Human beings are skilled at dividing objects into groups (clustering) and assigning particular objects to these groups (classification).

Business- Businesses collect large amount of information on current and potential customers. Clustering can be used to segment customers into a small number of group for additional analysis and marketing activities.

Clustering for utility- Cluster analysis provides an abstraction from individual data objects to the clusters in which those data objects reside. Additionally, some clustering techniques categorize each cluster in terms of a cluster prototype; i.e., a data object that is representative of the other objects in the cluster. These cluster prototypes can be used as the basis for a number of data analyses or data processing techniques.

Cluster analysis methods will always produce a grouping. The groupings produced by cluster analysis may or may not prove useful for classifying objects. If the groupings discriminate between variables not used to do the grouping and those discriminations are useful, then cluster analysis is useful. For example, if grouping zip code areas into fifteen categories based on age, gender, education, and income discriminates between wine drinking behaviors, it would be very useful information if one was interested in expanding a wine store into new areas.

There are several things to be aware of when conducting cluster analysis:

1. The different methods of clustering usually give very different results. This occurs because of the different criterion for merging clusters (including cases). It is important to think carefully about which method is best for what you are interested in looking at.

2. With the exception of simple linkage, the results will be affected by the way in which the variables are ordered.

3. The analysis is not stable when cases are dropped: this occurs because selection of a case (or merger of clusters) depends on similarity of one case to the cluster. Dropping one case can drastically affect the course in which the analysis progresses.

4. The hierarchical; nature of the analysis means that early ‘bad judgements’ cannot be rectified.

Cluster analysis methods are not clearly established. There are many options one may select when doing a cluster analysis using a statistical package. Cluster analysis is thus open to the criticism that a statistician may mine the data trying different methods of computing the proximities matrix and linking groups until he or she "discovers" the structure that he or she originally believed was contained in the data.





Group- HR1

Blog Author- Harshada Thakurdesai (13080)

No comments:

Post a Comment