Well.. To start off with dendrograms, I being a student of Bio-Technology had a subject called bio-informatics where we used the same techniques to classify organisms into clusters. Looks similar to what was taught in the class right? This made me think about clusters and dendrograms and how are they used. So here; I would be discussing clusters and their usefulness.
Cluster analysis is identifying groups of individuals or objects that are similar to each other but different from individuals in other groups. Using your customer base, you may be able to form clusters of customers who have similar buying habits or demographics. You can take advantage of these similarities to target offers to subgroups that are most likely to be receptive to them. Based on scores on psychological inventories, you can cluster patients into subgroups that have similar response patterns. This may help you in targeting appropriate treatment and studying typologies of diseases. By analyzing the mineral contents of excavated materials, you can study their origins and spread.
· You need to identify people with similar patterns of past purchases so that you can tailor your marketing strategies.
· You’ve been assigned to group television shows into homogeneous categories based on viewer characteristics. This can be used for market segmentation
· You want to cluster skulls excavated from archaeological digs into the civilizations from which they originated. Various measurements of the skulls are available
· You’re trying to examine patients with a diagnosis of depression to determine if distinct subgroups can be identified, based on a symptom checklist and results from psychological tests.
You start out with a number of cases and want to subdivide them into homogeneous groups. First, you choose the variables on which you want the groups to be similar. Next, you must decide whether to standardize the variables in some way so that they all contribute equally to the distance or similarity between cases. Finally, you have to decide which clustering procedure to use, based on the number of cases and types of variables that you want to use for forming clusters.
For hierarchical clustering, you choose a statistic that quantifies how far apart (or similar) two cases are. Then you select a method for forming the groups. Because you can have as many clusters as you do cases (not a useful solution!), your last step is to determine how many clusters you need to represent your data. You do this by looking at how similar clusters are when you create additional clusters or collapse existing ones.
In k-means clustering, you select the number of clusters you want. The algorithm iteratively estimates the cluster means and assigns each case to the cluster for which its distance to the cluster mean is the smallest.
There are numerous ways in which clusters can be formed. Hierarchical clustering is one of the most straightforward methods. It can be either agglomerative or divisive. Agglomerative hierarchical clustering begins with every case being a cluster unto itself. At successive steps, similar clusters are merged. Divisive clustering starts with everybody in one cluster and ends up with everyone in individual clusters. In agglomerative clustering, once a cluster is formed, it cannot be split; it can only be combined with other clusters. Agglomerative hierarchical clustering doesn’t let cases separate from clusters that they’ve joined. Once in a cluster- always in that cluster.
This is all about the introduction and basics of clusters and how are they useful.
Author of the article- Rohita Sundru(13095)