Wednesday 31 August 2011

Discriminant Function Analysis

The main purpose of a discriminant function analysis is to predict group membership based on a linear combination of the interval variables. The procedure begins with a set of observations where both group membership and the values of the interval variables are known.

The end result of the procedure is a model that allows prediction of group membership when only the interval variables are known. A second purpose of discriminant function analysis is an
understanding of the data set, as a careful examination of the prediction model that results from the procedure can give insight into the relationship between group membership and the variables used to predict group membership.

In today’s example: The summary table and the statistics - means and unstandardized format gives us a case of 150, where there is missing information after 700 of 150 cases. So we predict for these 150 cases, an equation telling us whether they will b defaulted or not and thus decide whether to give loan or not. In group statistics, there are yes and no factors which are age, level of education, debt to income ratio, etc. Our hypothesis can be confirmed based on these factors.

This is one of the most critical method by which we can compare the actual data versus the forecasted data. We can know how many are predicted correctly. We can calculate the accuracy of
the prediction by calculating the correct predictions to the actual figures. If the score we get from this calculation is above 50%, we can term it as a good prediction. If the score is below 50%, we need to revisit and check the variables and drop a few if required.

We can use regression formula i.e. a = b + cx = constant + (coeff1 *var + codeff2* var)

From Canonical Discriminant function coefficients (Unstandardized coefficient) we need to transform the compute variable and add the values from canonical discriminant function coefficients
after which the function values are multiplied with the factors. Thus we can calculate the score.

Eg of the working is. -0.705+ 0.015*age + 0.078*EA... = 2.11 (in this case)

Here, 2.11 is the score that represents the prediction, whether a person will default or not. This score will be compared with the mean of the functions at group centroids. If the value is greater than the mean then we can assume that it is default and if it is lesser than the mean, it is not default. We can check the classification results, as to how many predictions were correct to give us the accuracy of the predictions. If the score is close to mean, then the decision is purely based on intuition or experience of the person giving the loan.

By this method we can segregate the extreme cases which need not be checked upon specifically making the task all the more easier.

Author:- Ishan Tupe
Group :- Operation 3

No comments:

Post a Comment