BA@SIBMB: K-Means Analysis and its Application in the area of Finance:

K-means clustering algorithm: K-means method is widely used due to rapid processing ability of large data. K-means clustering proceeds in the following order. Firstly, K number of observations is randomly selected among all N number of observations according to the number of clusters. They become centers of initial clusters. Secondly, for each of remaining N–K observations, find the nearest cluster in terms of the Euclidean distance. After each observation is assigned the nearest cluster, recompute the center of the cluster. Lastly, after the allocation of all observation, calculate the Euclidean distance between each observation and cluster’s center point and confirm whether it is allocated to the nearest cluster or not.

When to use Hierarchical Clustering (Agglomerative) and K Means?

We can have as many clusters as we do cases, so our last step is to determine how many clusters we need to represent data. We do this by looking at how similar clusters are when we create additional clusters or collapse existing ones. In k-means clustering, we select the number of clusters we want. The algorithm iteratively estimates the cluster means and assigns each case to the cluster for which its distance to the cluster mean is the smallest. In two-step clustering, to make large problems tractable, in the first step, cases are assigned to “preclusters.” In the second step, the preclusters are clustered using the hierarchical clustering algorithm.

So, the suggested approach more likely is;

1. First perform a hierarchical method to define the number of clusters

2. Then use the k-means procedure to actually form the clusters

Source: http://www.mvsolution.com/wp-content/uploads/SPSS-Tutorial-Cluster-Analysis.pdf

http://www.norusis.com/pdf/SPC_v13.pdf

Applications:

1. One of the most important problems in modern finance is finding efficient ways to summarize and visualize the stock market data to give individuals or institutions useful information about the market behavior for investment decisions. The enormous amount of valuable data generated by the stock market has attracted researchers to explore this problem domain using different methodologies. This paper investigates stock market investment issues on Taiwan stock market using a two-stage data mining approach. The K-means algorithm is a methodology of cluster analysis implemented to explore the stock cluster in order to mine stock category clusters for investment information. By doing so, this paper proposes several possible Taiwan stock market portfolio alternatives under different circumstances.

The above approach can also be used in designing the virtual stock market application.

Source: http://dl.acm.org/citation.cfm?id=1379588

2. Segmentation of stock trading customers according to potential value:

I went through an article while researching the various applications of the K means analysis in the financial sector (specifically, Stock markets). Here the Korean stock market was discussed from 1990s when it was actually expanding. In this article, they use three clustering methods (K-means, self-organizing map, and fuzzy K-means) to find properly graded stock market brokerage commission rates based on the 3-month long total trades of two different transaction modes (representative assisted and online trading system). Results of the empirical analysis indicate that fuzzy K-means cluster analysis is the most robust approach for segmentation of customers of both transaction modes.

3. Post-IPO corporate life cycle and takeovers:
The paper analysed here was an attempt to examine the impact of corporate life cycle on acquisition likelihood. Basically it discusses how the corporate life cycle determines the takeover strategies. This analysis have used corporate life cycle theories to investigate the motives and wealth effects of takeovers by classifying firms into three post-IPO stages using cluster analysis. Some of the findings by this research are:

· Firms at the young stage are more likely to be acquired when they have higher liquidity, less leverage, lower free cash flow and are undervalued.

· Young firms’ acquisition likelihood is negatively related to the existence of golden parachutes and blank check provisions.

· Firms in the mature cluster are more likely to be acquired when they are undervalued, have less free cash flow, golden parachutes, and super majority amendments in place.

· However, the presence of a classified board reduces the likelihood of acquisition in mature firms.

· Finally, old acquired firms have higher free cash flow and more tangible assets than other targets and are less likely to have a supermajority amendment as a takeover defence.