## Sunday, 28 August 2011

### BA@SIBMB: Data and Cluster Analysis

Data and Cluster Analysis

Data can be made more useful by using various kinds of statistical analysis, for e.g. cross tabbing in which data can be tabulated (called contingency table) from the multivariate frequency distribution of statistical variables, the table shows the frequency of sample units against many combinations formed. For e.g. tabulating gender of participants with the marital status. This is a basic analysis to determine a distribution of sample units in each category.

Another tool called “Hypothesis testing” is what can relate any attribute/property with the other unless it comes from the same sample of data. The attribute can be analyzed to either have a relation/ have no relation with another attribute of the data sample, let us put it this way- a consumer’s action to have disliked a service of a particular store depends on the customers gender. This can be further analyzed to the degree of purchase i.e. if more customers who are dissatisfied bought apparels or footwear or grocery.

The data that shows many properties of a sample of data can be dug to the level of finding out if there is any relation between any of the two variables. For e.g. age of first marriage could be related to the income.

Another type of data analysis is “Cluster Analysis” where data is divided into groups and clustered together based on any attribute (Hierarchical Clustering).

The benefit of doing so can be understanding from marketing point of view, e.g. for a travel agency the customers can be clustered into segments based in their interest and expectations. 1) The demanders - they want exceptional service and expect to be pampered; 2) The escapists - they want to get away and just relax; 3) The educationalist - they want to see new things, go to museums, go on a safari, or experience new cultures.

One of the types of clustering is Hierarchical clustering in which objects are organized into an hierarchical structure as part of the procedure

§ Divisive clustering - start by treating all objects as if they are part of a single large cluster, then divide the cluster into smaller and smaller clusters

§ Agglomerative clustering - start by treating each object as a separate cluster, then group them into bigger and bigger clusters.

An important step in most clustering is to select a distance measure, which determines how the similarity of two elements is calculated. This influences the shape of the clusters, as some elements may be close to one another according to one distance and farther away according to another. Which determines how the similarity of two elements is calculated. This influences the shape of the clusters, as some elements may be close to one another according to one distance and farther away according to another.

--

Group Ops3