BA@SIBMB: DAY 1 at a glance !!

SPSS software is used in data mining, analyzing quantitative data and for analyzing various variables that can affect a business. It is used by market researchers, health researchers, survey companies, government, education researchers, marketing organizations and others.

Statistics included in the base software are as follows:

 Descriptive statistics: Cross tabulation, Frequencies, Descriptives, Explore, Descriptive Ratio Statistics

 Bivariate statistics: Means, t-test, ANOVA, Correlation (bivariate, partial, distances), Non-parametric tests

 Prediction for numerical outcomes: Linear regression

 Prediction for identifying groups: Factor analysis, cluster analysis (two-step, K-means, hierarchical), Discriminant.

(http://en.wikipedia.org/wiki/SPSS)

SPSS datasets have a table structure where the rows typically represent cases (such as individuals or households, which can be given names and labels) and the columns represent measurements (such as age, sex or household income, which are given values). The values, names or labels are provided by clicking on the variable view and then filling in.

The 'Data View' shows a spreadsheet view of the rows and columns. Unlike spreadsheets, the data cells can only contain numbers or text. The 'Variable View' displays the information or characteristics where each row represents a variable and shows the variable name, variable label, value, print width, measurement type and a variety of other characteristics.

When we want to understand the relation between one factor and another factor, Frequencies are analyzed . This is done by clicking on analyze tab and then clicking on descriptive statistics tab and then clicking on frequencies. This is done for single variate analysis.

For bivariate analysis, the cross tabulation tab is selected. This done by clicking on analyze tab, and then clicking descriptive statistics and then selecting cross tabs. The chi square can also be calculated by selecting ‘cells’ n selecting Pearson’s chi-square.

In the output window a complete table of both variables and its relations will appear. The factor on which the hypothesis is tested will be the row variable. And if the significant value in the chi-square test variable is less than 0.05, then the hypothesis is accepted.

Hence, data can be correlated and analyzed to conclude a hypothesis.

Cluster analysis is the process of grouping a set of observations together, in order to analyse similar data.

There are different kinds of analysis :

• Hierarchical clustering: find successive clusters using previously established clusters. These algorithms usually are either bottom-up or top-down)

• Divisive clustering (top-down): We start at the top with all documents in one cluster. The cluster is split using a flat clustering algorithm.

• Agglomerative clustering (bottom – up) : Bottom-up algorithms treat each document as a singleton cluster at the outset and then successively merge (or agglomerate) pairs of clusters until all clusters have been merged into a single cluster that contains all documents

Clustering process involves:

• Selection of variables.

• Distance measurement

• Clustering criteria.

Dendograms:

One product of cluster analysis is a tree diagram representing the entire process of going from individual points to one big cluster. This diagram is called a dendrogram. A dendrogram is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering. Dendrograms are often used in business analytics to illustrate the clustering of similar variables or samples.

Deciding the number of clusters to map can be aided by looking at the dendrogram. There are three key pieces of information that you can get from the dendrogram. They are:

• Weight - the rough percentage of all individuals that fall within each cluster

• Compactness - how similar to one another the elements of a cluster are

• Distinctness - how different one cluster is from its closest neighbor

Hierarchical clustering dendogram would look like this: