Monday, 29 August 2011

Data Clusters

It was my first class today, so to start off with, it wasn’t easy especially hearing the words clusters and hierarchical data clustering and k means for the first time. But after a while, it seemed a bit ok.

Now what on earth is hierarchical data clustering? In simple words, it refers to grouping of data in various clusters in a hierarchical manner such that they amount to the same meaning. The clusters are progressively clustered such that no single element is left out at the end of it. This provides a means to analyse the data through a dendogram, which is a tree graph for displaying the results. The agglomeration schedule gives information on the objects or cases being combined at each stage of a hierarchical clustering process.

K means clustering. In this the algorithm assigns a single point to a cluster whose center is the nearest. And this is taken as the average of all points in the cluster. Usually simple and fast, it is predominantly used for large databases

There are various applications towards clustering. They can be used in medicine, and market research and even to decide demographics for election purposes and towards determining the ideal cd mix for a party. It takes into consideration various points such that the ones which share the greatest correlation are placed closest to each other.

So it really is quite a useful technique. Developed in the late 1960’s, it has found great acceptance in almost each and every scientific field. It provides an invaluable aid in analysing long convoluted surveys which never seem to make much of sense. Data analysis is way more important than the data being gathered, with the result that if one does not apply the right techniques and ideas, the answers may not be so forthcoming.



No comments:

Post a Comment