Hierarchical clustering
Hierarchical clustering creates a hierarchy of clusters which may be represented in a tree structure called a dendrogram. The root of the tree consists of a single cluster containing all observations, and the leaves correspond to individual observations. Hierarchical clustering can be either agglomerative or divisive.
Agglomerative: This is a bottom up approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. In agglomerative clustering, once a cluster is formed, it cannot be split; it can only be combined with other clusters. Agglomerative hierarchical clustering doesn’t let cases separate from clusters that they’ve joined.
Divisive: A top-down clustering method and is less commonly used. It works in a similar way to agglomerative clustering but in the opposite direction. This method starts with a single cluster containing all objects, and then successively splits resulting clusters until only clusters of individual objects remain.
To form clusters using a hierarchical cluster analysis, you must select:
· A criterion for determining similarity or distance between cases
· A criterion for determining which clusters are merged at successive steps
· The number of clusters you need to represent your data
Types of clusters:
· Well separated clusters: Each cluster point is closer to all of the points in its cluster than to any point in another cluster
· Centre based clusters: Each point is closer to the centre of its cluster than to centre of any other cluster
· Contiguity based clusters: Each point is closer to at least one point in its cluster than to any point in another cluster
· Density based cluster: Clusters are regions of high density separated by regions of low density
· Conceptual clusters: Points in a cluster share some general property that derives from the entire set of points.
Distance between Cluster Pairs:
These methods define the distance between two clusters at each stage of the clustering procedure:
Nearest neighbor (single linkage): If you use the nearest neighbor method to form clusters, the distance between two clusters is defined as the smallest distance between two cases in the different clusters.
Furthest neighbor (complete linkage): If you use a method called furthest neighbor (also known as complete linkage), the distance between two clusters is defined as the distance between the two furthest points.
Ward’s method: For each cluster, the means for all variables are calculated. Then, for each case, the squared Euclidean distance to the cluster means is calculated. These distances are summed for all of the cases. At each step, the two clusters that merge are those that result in the smallest increase in the overall sum of the squared within-cluster distances. The coefficient in the agglomeration schedule is the within-cluster sum of squares at that step, not the distance at which clusters are joined.
Centroid method: This method calculates the distance between two clusters as the sum of distances between cluster means for all of the variables. In the centroid method, the centroid of a merged cluster is a weighted combination of the centroids of the two individual clusters, where the weights are proportional to the sizes of the clusters.
Median method: With this method, the two clusters being combined are weighted equally in the computation of the centroid, regardless of the number of cases in each. This allows small groups to have an equal effect on the characterization of larger clusters into which they are merged.
K-means clustering :
This is the most popular partitioning method. In this we have to specify the number of clusters to extract. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster. The next step is to take each point belonging to a given data set and associate it to the nearest centroid. When no point is pending, the first step is completed and an early groupage is done. At this point we need to re-calculate k new centroids of the clusters resulting from the previous step. After we have these k new centroids, a new binding has to be done between the same data set points and the nearest new centroid. A loop has been generated. As a result of this loop we may notice that the k centroids change their location step by step until no more changes are done. In other words centroids do not move any more. Finally, this algorithm aims at minimizing an objective function, in this case a squared error function.
References
http://www.stat.cmu.edu/~cshalizi/350/lectures/08/lecture-08.pdf
Group: Marketing 6
Author: Soupa Soundararajan (13109)
Good Job !
ReplyDelete