Sunday 28 August 2011

SPSS

When you’re working on SPSS, you work in one of several windows:

· Data view- The data view displays your actual data and any new variables you have created.

· Variable view- The variable view window contains the definitions of each variable in your data set, including its name, type, label, size, alignment, and other information

o Note: While the variables are listed as columns in the Data View, they are listed as rows in the Variable View. In the Variable View, each column is a kind of variable itself, containing a specific type of information.

· Output view- The output window is where you see the results of your various queries

· Draft output view- The draft view is where you can look at output as it is generated for printing.

· Syntax view - SPSS has never lost its roots as a programming language. Although most of your daily work will be done using the graphical interface, at the back end everything will be coded as a script or syntax. Syntax is basically the actual computer code that produces a specific output.

Source : http://www.datastep.com/SPSSTutorial_1.pdf.

We have already learnt in class today that there are 3 types of Measures.

1. Nominal variable, this can also be called as “Qualitative Variable”, as it describes only quality not quantity aspect. Ex: race, gender etc.

2. Ordinal Variable can also be called as “Ranking Variable”, as it is used to rank the objects, but those numbers do not have any meaning in themselves. Ex: contest winners

3. Interval Variable-involves ranking as well has specific numerical distance between them. Ex: income, years of education completed etc.

Source: http://www.socqrl.niu.edu/myers/univariate.htm.

We have learnt in the class about recoding data into another variable. The one & only one rule is to recode the data into a different variable. But there lies an exception to the rule. There are cases where you’ll want to recode unacceptable values to a flag value that you can use in sub setting your data. For example, suppose you’re surveying income and you find that values range from 0 to 55000, except for ten values that are all greater than ten million. As it happens, you can’t go back to confirm the odd values, but you don’t want to use them in your calculations. In that case, you might want to recode anything over, say, 60000 to a flag value like 99999. In your calculations, you then have a standard value you can use to exclude any outliers or suspicious data from your analyses.

Measuring association

Typically the association between two variables is evaluated by using a bivariate analysis. Bivariate techniques are

· Cross tab

· Regression

· Correlation etc.

Cross Tabs

A cross tabulation is a joint frequency distribution of cases according to two or more classificatory variables. The display of the distribution of cases by their position on two or more variables is the chief component of contingency table analysis.

The Chi-square test can be used to determine whether the frequency distributions of one or more categorical variables are statistically independent. The crosstab can be used to provide measures of the associations of categorical variables. Some of the measures of association are the contingency coefficient, phi, tau, gamma, etc.. These measures describe the degree to which the values of one variable predict or vary with those of another. Data requirements: Crosstabs require categorical data or continuous data recoded into categories, such as income or age ranges. The frequencies for each variable in the population should be approximately normal.

The degrees of freedom for a contingency table is the number of columns -1 times the number of rows -1. Conceptually, the degrees of freedom is a count of how many cells in which you are free to enter any number you want given that you know the margins. In general it indicates the number of data points you can specify before all remaining data points are determined. As the degrees of freedom increase, the larger the value of Chi-square needed to be statistically significant.

Source : http://www.datastep.com/SPSSTutorial_2.pdf

Cross Tabulations are popular choices for statistical reporting because they are very easy to understand and they are laid out in a clear format. They can be used with any level of data whether the data is ordinal, nominal, interval or ratio because the Crosstab will treat all of them as if they are nominal data. Crosstab tables are provide more detailed insights to a single statistics in a simple way and they solve the problem of empty or sparse cells.

The Lambda Coefficient is a method of testing the strength of association of Crosstabs when the variables are measured at nominal level. Cramer’s V is another testing method that test the strength of Crosstabs which adjusts the number of rows and columns. Other ways to test the strength of Crosstabs associations include Chi-square, Contingency Coefficient, Phi Coefficient and the Kendall tau.

Companies find the services of a data warehouse very indispensable. But inside the data warehouse can be found billions of data which most of them are unrelated. Without the aid of tools, these data will not make any sense to the company. These data are not homogenous. They may come from various sources, often from other data suppliers and other warehouses which may be coming from other branches in other geographical locations.

Software applications like relational database monitoring systems have Cross Tabulation functionalities which allow end users to correlate and compare any piece of data. Crosstab analysis engines can examine dozens of table very fast and efficiently and these engines can even create full statistical outputs by very clicks of the mouse or keyboards.

Source: http://www.learn.geekinterview.com/data-warehouse/dw-basics/what-is-crosstab.html.

From

Author: Ashruta S. Shettar

Roll No: 13119

Specialization: Finance.

1 comment: