Of CrossTabulations and Clusters

As we begin, let me teach you what clustering analysis is.

Cluster Analysis is a statistical tool that can be applied to data that exhibit natural groupings (sic). A cluster is a group of homogeneous or similar observations. Now there are two major ways of going about clustering which are as follows:

· Hierarchical Clustering: This type of clustering separates data into clusters in hierarchical order and represents them in a tree like structure called a dendrogram. This in turn can be divided into two different types.

o Agglomerative Clustering: A bottom up approach in which we start from an individual cluster and move up, building or forming clusters until we reach the apex with an overall cluster

o Divisive Clustering: A top down approach where we start with an individual cluster and break them down into individual clusters until they cannot be divided anymore

· K-Means Clustering: A form of non hierarchical clustering in which clusters are determined based on the centroid of the data sets. The main advantage of this is that it is simple and the most popular method of partitioning data.

In today's class, my second time working with a tool like SPSS, I found that data handling could be so easy. Excel of course is the name that comes to mind immediately when somebody talks about data analysis but just the first look suggests that the innumerable resources that SPSS provides can make sorting through data a cakewalk.

For example, today we worked with Frequencies and Crosstabs. Sounds simple doesn't it. Only that it doesn't seem so simple when you work with it. SPSS provides innumerable tools like Row and Column Percentages and Chi-Squared tests at the click of a button. Life made simple !

Come to think of it. A store manager might like to know why his sales are decreasing or increasing ( the optimist that I always am). Operating a store, he/she knows that thousands visit the store to buy different kinds of items. Not everybody can be asked for feedback and not everybody's voice is heeded. What they could do would be to look to SPSS and just chart a relationship between what could possibly be the reason behind a nagging pain to the company/store.

Coming to think of real life examples, I just look up and I see a router with its light beeping constantly. DO you know how the Internet in campus works. By Clustering. The Boys Wing and the Girls wings have different net connection settings. They are nothing but clusters. Moreover, they are nothing but K-Means clusters. Nobody is given preference. Everybody connects through the same Internet Gateway (has to do a little research to obtain this term).

So at the end of the day, clustering is all around you. It's in the way you make your own choices, how you make friends and so on.

