## Wednesday, 31 August 2011

### Discriminant Analysis

Today in BA class we started up with the file ‘Bank Loan’, where we learnt Discriminant Analysis. In our case dependent variable (Previously defaulted) and independent variable (Age in years, Level of education, Years at current address and many more). We did the regression analysis to find out the score to indicate whether person will default or not.

Introduction to Discriminant Analysis

Discriminant analysis is a statistical method that is used by researchers to help them understand the relationship between a "dependent variable" and one or more "independent variables." A dependent variable is the variable that a researcher is trying to explain or predict from the values of the independent variables. Discriminant analysis is similar to regression analysis and analysis of variance (ANOVA). The principal difference between discriminant analysis and the other two methods is with regard to the nature of the dependent variable.

Discriminant analysis requires the researcher to have measures of the dependent variable and all of the independent variables for a large number of cases. In regression analysis and ANOVA, the dependent variable must be a "continuous variable." A numeric variable indicates the degree to which a subject possesses some characteristic, so that the higher the value of the variable, the greater the level of the characteristic. A good example of a continuous variable is a person's income.

In discriminant analysis, the dependent variable must be a "categorical variable." The values of a categorical variable serve only to name groups and do not necessarily indicate the degree to which some characteristic is present. An example of a categorical variable is a measure indicating to which one of several different market segments a customer belongs; another example is a measure indicating whether or not a particular employee is a "high potential" worker. The categories must be mutually exclusive; that is, a subject can belong to one and only one of the groups indicated by the categorical variable. While a categorical variable must have at least two values (as in the "high potential" case), it may have numerous values (as in the case of the market segmentation measure). As the mathematical methods used in discriminant analysis are complex, they are described here only in general terms. We will do this by providing an example of a simple case in which the dependent variable has only two categories.

Steps in Discriminant Analysis

There are two basic steps in discriminant analysis.

1) The first involves estimating coefficients, or weighting factors, that can be applied to the known characteristics of job candidates (i.e., the independent variables) to calculate some measure of their tendency or propensity to become high performers. This measure is called a "discriminant function."

2) Second, this information can then be used to develop a decision rule that specifies some cut-off value for predicting which job candidates are likely to become high performers.

The tendency of an individual to become a high performer can be written as a linear equation. The values of the various predictors of high performer status (i.e., independent variables) are multiplied by "discriminant function coefficients" and these products are added together to obtain a predicted discriminant function score. This score is used in the second step to predict the job candidates likelihood of becoming a high performer. Suppose that you were to use three different independent variables in the discriminant analysis. Then the discriminant function has the following form:

where D = discriminant function score,
B , = discriminant function coefficient relating independent variable i to the discriminant function score,
X = value of independent variable i.

The equation is quite similar to a regression equation. Conventional regression analysis should not be used in place of discriminant analysis. The dependent variable would have only two values (high performer and low performer) and would thus violate important assumptions of the regression model. Discriminant analysis does not have these limitations with respect to the dependent variable.

KETAN MARWAH

FINANCE - GROUP 1