## Monday, 29 August 2011

### Better understanding of Cluster Analysis

The second day began with a better and clear understanding of the concepts that we learnt yesterday and progressed on to understand many other new ones. The SPSS software, for me, till before yesterday was a mechanical exercise to extract some outputs for a given set of data. After these sessions I realised the importance of analysing the output for various strategic business decisions.

As we learnt more on cluster analysis and delved more into hierarchical clustering and moved on to k- means clustering, I would like to discuss a few examples where I think the concepts can be useful.

The approach to cluster analysis can be used to classify hedge funds given the lack of classification of ‘pure’ hedge funds type. The various parameters that can be used as a basis of hedge funds classification are:

Ø Asset class

Ø Region of investment

Ø Liquidity of investment strategy

K-means clustering can be used to classify hedge funds on the basis of the above-mentioned attributes. K-means is used when the number of objects is more than 50 and if the number of objects is less than 50, we use hierarchical clustering. This method identifies the closest cluster centre (in terms of a distance measure) for each hedge fund and assigns the hedge fund to that cluster.

Now something about the 0s and 1s of the proximity matrix table...

The logic of similarity and dissimilarity is a little confusing. Therefore the thumb rule is:

Lesser distance and more proximity mean there is dissimilarity and there are 0s in the diagonal of the matrix.

Lesser distance and lesser proximity mean there is a similarity and there are 1s in the diagonal of the matrix.

Difference between Euclidean distance and Squared Euclidean Distance

Deriving the Euclidean distance between two data points involves computing the square root of the sum of the squares of the differences between corresponding values.

Euclidean Squared distance metric uses the same equation as the Euclidean distance metric, but does not take the square root. As a result, clustering with the Euclidean Squared distance metric is faster than clustering with the regular Euclidean distance.

Okay. That was from my understanding of today’s sessions!!

Author: Ankita Agarwal (13008)

Group: Finance_6