BA@SIBMB: Business Analytics

Descriptive Statistics

Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data. For more on SPSS descriptive analysis: This can be used to analyze the price movement in of a stock, risk and return associated with a portfolio etc. We can see how descriptive analysis work in the given example.http://www.youtube.com/watch?v=4CWeHF3Mn00

Frequency Tab can be used to determine quartiles, percentiles, measures of central tendency (mean, median, and mode), measures of dispersion (range, standard deviation, variance, minimum and maximum), measures of kurtosis and skewness, and create histograms. The command is found at Analyze | Descriptive Statistics | Frequencies (this is shorthand for clicking on the Analyze menu item at the top of the window, and then clicking on Descriptive Statistics from the drop down menu, and Frequencies from the pop up menu.):

Crosstabs is an SPSS procedure that cross-tabulates two variables, thus displaying their relationship in tabular form. In contrast to Frequencies, which summarizes information about one variable, Crosstabs generates information about bivariate relationships.Crosstabs creates a table that contains a cell for every combination of categories in the two variables.

Inside each cell is the number of cases that fit that particular combination of responses.
SPSS can also report the row, column, and total percentages for each cell of the table.

Because Crosstabs creates a row for each value in one variable and a column for each value in the other, the procedure is not suitable for continuous variables that assume many values. Crosstabs is designed for discrete variables--usually those measured on nominal or ordinal scales. For more on Crosstabs please use the link given below.

http://www.youtube.com/watch?v=IRCzOD27NQU

Cluster Analysis

SPSS has three different procedures that can be used to cluster data: hierarchical cluster analysis, k-means cluster, and two-step cluster. If you have a large data file (even 1,000 cases is large for clustering) or a mixture of continuous and categorical variables, you should use the SPSS two-step procedure. If you have a small data set and want to easily examine solutions with increasing numbers of clusters, you may want to use hierarchical clustering. If you know how many clusters you want and you have a moderately sized data set, you can use k-means clustering. Hierarchical clustering is one of the most straightforward methods. It can be either agglomerative or divisive. Agglomerative hierarchical clustering begins with every case being a cluster unto itself. Divisive clustering starts with everybody in one cluster and end up with everyone in individual clusters.

Distance is a measure of how far apart two objects are, while similarity measures how similar two objects are. For cases that are alike, distance measures are small and similarity measures are large. There are many different definitions of distance and similarity. Some, like the Euclidean distance, are suitable for only continuous variables, while others are suitable for only categorical variables. There are also many specialized measures for binary variables. If you want a visual representation of the distance at which clusters are combined, you can look at a display called the dendrogram.

Proximity Matrix represents the Euclidean distance. Distance between Cluster Pairs can be measured by different method like Nearest neighbour (single linkage), Furthest neighbour (complete linkage), unweighted pair-group method using arithmetic averages and Average linkage within groups, Ward’s method and Centroid method and Median method. For more on cluster analysis please use the link given below.

http://www.youtube.com/watch?v=YYObOp8GJ8M

Cautions in Cluster Analysis

¨ Cluster analysis is extremely sensitive to correlation. All efforts should be taken to eliminate variables that are correlated with one another, and if this is not possible, then make sure a validation of the clustering is done. One way to use correlated variables is to use them as a factor, or a linear combination of the variables that are correlated.

¨ Cluster analysis is more of a heuristic than a statistical technique. As such, it does not have the foundation of statistical tests and reasoning (unlike regression analysis, for instance),

¨ Cluster analysis evolved from many different disciplines and has inherent biases from these disciplines. Thus, the analysis can be biased by the questions asked of it, and

¨ Different methods and different numbers of clusters generate different solutions from the same data, so it is important to validate your findings.

A detailed example on the application of cluster analysis in the Financial Service can be found in a paper - An Application of Cluster Analysis in the Financial Services Industry Satish Nargundkar and Timothy J. Olzer; May & Speh, Strategic Decision Services; Atlanta, GA. Follow the link given below to get the detailed paper:

http://www.nargund.com/gsu/mgs8040/resource/dm/ClusterPaper.doc

Sources:

· http://janda.org/c10/Lectures/topic09/crosstabsSPSS.htm

· http://www.hks.harvard.edu/fs/pnorris/Classes/A%20SPSS%20Manuals/SPSS%20Statistics%20Brief%20Guide%2017.0.pdf