Considering the fact that blogging does not really interest me, the very thought of it,
that too as an assignment, viewable to the whole BA batch makes me nervous.
And as i was looking for a topic to write on (based on what i understood in the class),
the idea of reflecting back on the basics of hierchical clustering struck my mind.
So lets begin with the baby steps to understanding hierchical cluster.
Hierchical clustering is used when the data is less than 50 and K-means is used when the data is more than 50.
If , for example, a set of 9 items is to be clustered, with a 9*9 distance matrix, then we should follow the steps given below:
a). Assign each item to its own cluster, such that if you have 9 items, you now have 9 clusters,
each containing just one item. Let the distances between the clusters equal the distances between the items they contain.
that too as an assignment, viewable to the whole BA batch makes me nervous.
And as i was looking for a topic to write on (based on what i understood in the class),
the idea of reflecting back on the basics of hierchical clustering struck my mind.
So lets begin with the baby steps to understanding hierchical cluster.
Hierchical clustering is used when the data is less than 50 and K-means is used when the data is more than 50.
If , for example, a set of 9 items is to be clustered, with a 9*9 distance matrix, then we should follow the steps given below:
a). Assign each item to its own cluster, such that if you have 9 items, you now have 9 clusters,
each containing just one item. Let the distances between the clusters equal the distances between the items they contain.
b). Find the closest (most similar) pair of clusters and merge them into a single cluster.
c). Compute distances (similarities) between the new cluster and each of the old clusters.
d). Repeat steps c and d until all items are clustered into a single cluster of size 9.
Computation of distances can be done in many ways, viz, single-link, complete-link or average-link clustering.
Single-linkage : Distance between the two closest elements in the two clusters.
Complete-linkage: It is the longest distance between any member of one cluster to any member of the other clusters.
Average linkage : It is the average distance between any member of one cluster to any member of the other clusters.
Agglomeration schedule coefficients depends upon the distances between the two clusters. In dendogram, we use
the average linkage clustering.
The cut-off line is the point at which the distance between the two clusters is the longest. and then we have the usage
of crosstabs, proximity matrix, boxplot (used to find the outliers) etc.
Hope this adds to your understanding of Hierchical clustering......
Group- HR1
Author- Tage Otung
No comments:
Post a Comment