Hierarchical clustering & Dendrograms
After having learnt about frequencies and cross tabs we moved on to the next topic which was Cluster Analysis. Clustering basically can be divided into 2: Hierarchial and K means. Hierarchial clustering is nothing but grouping of data on various scales. Usually a cluster tree is created which is also known as a Dendrogram. Hierarchial clustering can be further divided into divisive clustering and agglomerative clustering. This method is very useful in various applications as it helps in identifying the most appropriate level of clustering.
Clustering process includes 3 steps:
· Selection of variable
· Distance measurement
· Clustering criteria
In case of proximity matrix table, lesser distance and more proximity means dissimilarity and lesser distance and less proximity means similarity. In case of dissimilarity the diagonal values are zero and in case of similarity the diagonal values are one.
In case of a dendrogram the first step is to identify which elements should be merged in a cluster. For the same purpose, two elements which are closest are taken, in accordance with the chosen distance. Same process needs to be repeated. Here we will notice that each agglomeration appears to be at a greater distance between the clusters than the previous one, and we can decide when to stop clustering. (Either when the clusters are too far to be merges or there are very small numbers of cluster)
Example: Dance show Judges
Dance show being judged by five judges. Let us say that each of the five judges gave each of ten groups four scores. Here we will see which groups of judges gave scores which were very close. After identifying the same, only the scores of the top four groups will be included.
Clustering can be used to identify various communities on social networking siteslike Facebook, Twitter, and Orkut etc which include a part of large group of people.
Group Name: Finance_3