BA@SIBMB: August 2011

Wednesday 31 August 2011

Discriminant Analysis - An overview!

Discriminant Analysis plays a significant role in pattern recognition, analysis of variances, and adequacy of classification. It is further used for cluster analysis

It generally classified as - Linear Discriminant Analysis, Multiple Discriminant Analysis (Factor and Canonical Discriminant Analysis) and K-NNs Discriminant Analysis.

For beginners, LDA is the more preferred analysis methodology.

The end result of the procedure is a model that allows prediction of group membership when only the interval variables are known. A second purpose of discriminant function analysis is an understanding of the data set, as a careful examination of the prediction model that results from the procedure can give insight into the relationship between group membership and the variables used to predict group membership.

An Example of LDA would be- a graduate admissions committee might divide a set of past graduate students into two groups: students who finished the program in five years or less and those who did not. Discriminant function analysis could be used to predict successful completion of the graduate program based on GRE score and undergraduate grade point average. Examination of the prediction model might provide insights into how each predictor individually and in combination predicted completion or non-completion of a graduate program.

The lectures that we have had as of now focused on 2 variable, LDA. Hence it would be not so appropriate to dwell into the explanation of MDA or Kernel Discriminant Analysis.

Written by: Umang Arora

Group: Marketing1

Discriminant Function Analysis

The main purpose of a discriminant function analysis is to predict group membership based on a linear combination of the interval variables. The procedure begins with a set of observations where both group membership and the values of the interval variables are known. The end result of the procedure is a model that allows prediction of group membership when only the interval variables are known. A second purpose of discriminant function analysis is an understanding of the data set, as a careful examination of the prediction model that results from the procedure can give insight into the relationship between group membership and the variables used to predict group membership.

An example might predict whether patients recovered from a coma or not based on combinations of demographic and treatment variables. The predictor variables might include age, sex, general health, time between incident and arrival at hospital, various interventions, etc. In this case the creation of the prediction model would allow a medical practitioner to assess the chance of recovery based on observed variables. The prediction model might also give insight into how the variables interact in predicting recovery.

A single interval variable might discriminate between groups in an almost perfect fashion, not at all, or somewhere in between. For example, if one wished to differentiate adult males and females, one could collect information on how many bras the person owned, score on the last statistics test, and height. In the case of the number of bras, the discrimination would be very good, but not perfect (some women don't own any bras, some men do). In the case of the score on the last statistics test, little discrimination would be possible because males and females generally score about the same. In the case of height, some discrimination between adult males and females would be possible, but it would be far from perfect.

In general, the larger the difference between the means of the two groups relative to the within groups variability, the better the discrimination between the groups.

In marketing, discriminant analysis was once often used to determine the factors which distinguish different types of customers and/or products on the basis of surveys or other forms of collected data. Logistic regression or other methods are now more commonly used. The use of discriminant analysis in marketing can be described by the following steps:

1. Formulate the problem and gather data

2. Estimate the Discriminant Function Coefficients and determine the statistical significance and validity

3. Plot the results on a two dimensional map, define the dimensions, and interpret the results.

While discriminant analysis is often used in marketing research for marketing segmentation and predicting group membership, there are more powerful and accurate techniques available.

http://en.wikipedia.org/wiki/Linear_discriminant_analysis#Marketing

Written by : Ajvad Rehmani

Group: Marketing1

Discriminating the 'Discriminant Analysis' Technique

Discriminant Analysis attempts to find a rule that separates clusters to the maximum possible extent.

Discriminant Analysis - Assumptions

The underlying assumptions of Discriminant Analysis (DA) are:

– Each group is normally distributed, Discriminant Analysis is relatively robust to departures from normality.

– The groups defined by the dependent variable exist a priori.

– The Predictor variable, Xk are multivariate normally distributed, independent, and non-collinear

– The variance/covariance matrix of the predictor variable across the various groups are the same in the population, (i.e. Homogeneous)

– The relationship is linear in its parameters

– Absence of leverage point outliers

– The sample is large enough: Unequal sample sizes are acceptable. The sample size of the smallest group needs to exceed the number of predictor variables. As a “rule of thumb”, the smallest sample size should be at least 20 for a few (4 or 5) predictors. The maximum number of independent variables is n - 2, where n is the sample size. While this low sample size may work, it is not encouraged, and generally it is best to have 4 or 5 times as many observations and independent variables

– Errors are randomly distributed

Drawback of Discriminant Analysis

– An important drawback of discriminant analysis is its dependence on a relatively equal distribution of group membership. If one group within the population is substantially larger than the other group, as is often the case in real life, Discriminant analysis might classify all observations in only one group. An equal good-bad sample should be chosen for building the discriminant analysis model.

– Another significant restriction of discriminant analysis is that it can’t handle categorical independent variables.

– Discriminant analysis is more rigid than logistic regression in its assumptions. In contrast to ordinary linear regression, discriminant analysis does not have unique coefficients. Each of the coefficients depends on the other coefficients in the estimation and therefore there is no way of determining the absolute value of any coefficient.

Discriminant Analysis Vs Logistic Regression

– Similarity: Both techniques examine an entire set of interdependent relationships

Discriminant Analysis Vs ANOVA

– Similarity: Both techniques examine an entire set of interdependent relationships

– Difference: In Discriminant analysis, Independent variables are metric where as in ANOVA it is categorical.

References

http://userwww.sfsu.edu/~efc/classes/biol710/discrim/discrim.pdf
http://www.shsu.edu/~icc_cmf/cj_742/stats7.doc

Author: Pratik Pawar

Group: Marketing - Group 4

Doing Discriminant Analysis on SPSS

Discriminant analysis is done to differentiate between groups usually between 2 groups. Discriminant analysis is based on regression. The effect of independent variables is studied on dependent variables.

The steps for a discriminant analysis are as follows:

1. Formulate the problem

2. Determine the discriminant function coefficients that result in the highest ratio of between-group variation to within-group.

3. Test the significance of the discriminant function.

4. Interpret the results.

5. Determine the validity of the analysis.

The steps to be followed in the SPSS software to get the relevant data are:

1. First enter the grouping variable. Then, define the lowest and highest coded value for the grouping variable by clicking on Button Define Range. Then, select the independent variables in the ‘Independents:’ box.

2. Button Statistics: Here you can indicate those statistics that are desired in discriminant analysis. Often these include means, univariate ANOVAs, unstandardized Function Coefficients.

3. Button Classify: Many classification options can be selected here, such as prior probabilities and plots. Also, a summary table can be requested.

4. Button Save: This option allows you to save as new variables: Predicted group membership, Discriminant Scores and Probabilities of group membership.

5. From the data received one of the most important value is the 'eigen value' and it is a canonical discriminant function. An eigenvalue indicates the proportion of variance explained. A large eigenvalue is associated with a strong function. The canonical relation is a correlation between the discriminant scores and the levels of the dependent variable. A high correlation indicates a function that discriminates well.

6. The Wilks Lambda is another important value. Wilks’ Lambda is the ratio of within-groups sums of squares to the total sums of squares. This is the proportion of the total variance in the discriminant scores not explained by differences among groups. A lambda of 1.00 occurs when observed group means are equal (all the variance is explained by factors other than difference between those means), while a small lambda occurs when within-groups variability is small compared to the total variability. A small lambda indicates that group means appear to differ. The associated significance value indicates whether the difference is significant.

7. ‘Functions at Group Centroids’ indicates the average discriminant score for subjects in the two groups. More specifically, the discriminant score for each group when the variable means (rather than individual values for each subject) are entered into the discriminant equation.

8. The ‘Canonical Discriminant Function Coefficients’ indicate the unstandardized scores concerning the independent variables. It is the list of coefficients of the unstandardized discriminant equation. Each subject’s discriminant score would be computed by entering his or her variable values (raw data) for each of the variables in the equation.

Author: Gayatri Nair

Group: Marketing - Group 4

Applications of Discriminant Analysis

Applications of Discriminant Analysis

Discriminant analysis is a statistical technique widely used in the business world. Discriminant analysis uses a collection of interval variables to predict a categorical variable that may be a dichotomy or have more than two values.

The technique involves finding a linear combination of independent variables (predictors) – the discriminant function – that creates the maximum difference between group membership in the categorical dependent variable.

Thus, DA is used when:

· the dependent variable is categorical with the predictor independent variables interval level such as age, income, attitudes, perceptions, and years of education.

· there are more than two Dependant Variable categories, unlike logistic regression, which is limited to a dichotomous dependent variable.

Discriminant analysis is used to forecast the outcome of a variety of variables that impact the profitability of a business. Classic examples of the applicantion of discriminant analysis include:

Performing an default risk evaluation of loan applicants;
Benchmarking of potential job applicants;
Forecasting insurance risk
Predicting academic performance from historical data
Developing auditing patterns
Fraud management

Discriminant analysis is most often used to help researchers analyze the group or category to which a subject belongs. Let us look at two examples.

Judging the credit worthiness of a loan-applicant

Discriminant analysis has been used with success in consumer credit and other forms of instalment lending in which various characteristics of an individual are quantitatively rated and a credit decision is made on the basis of the total score. The plastic credit cards many of us carry often are given out on the basis of a credit scoring system that takes into account such things as age, occupation, duration of employment, home ownership, years of residence, telephone, and annual income.

Numerical rating systems also are used by companies extending trade credit. With the overall growth of trade credit, a number of companies are finding it worthwhile to screen out "clear" accept and reject applicants. In other words, routine credit decisions are made on the basis of a numerical score.

Marginal applicants, who fall between "clear" accept or reject signals, can then be analyzed in detail by the credit analyst. In this way, a company is able to achieve greater efficiency in its credit investigation process. It uses trained credit analysts to the best advantage.

Judging the suitability of a candidate for a job.

When individuals are interviewed for a job, managers do not know for sure how job candidates will perform on the job if hired. Suppose, however, that a human resource manager has a list of current employees who have been classified into two groups: "high performers" and "low performers." These individuals have been working for the company for some time, have been evaluated by their supervisors, and are known to fall into one of these two mutually exclusive categories.

The manager also has information on the employees' backgrounds: educational attainment, prior work experience, participation in training programs, work attitude measures, personality characteristics, and so forth. This information was known at the time these employees were hired. The manager wants to be able to predict, with some confidence, which future job candidates are high performers and which are not.

A researcher or consultant can use discriminant analysis, along with existing data, to help in this task.

There are two basic steps in discriminant analysis. The first involves estimating coefficients, or weighting factors, that can be applied to the known characteristics of job candidates (i.e., the independent variables) to calculate some measure of their tendency or propensity to become high performers. This measure is called a "discriminant function." Second, this information can then be used to develop a decision rule that specifies some cut-off value for predicting which job candidates are likely to become high performers.

The tendency of an individual to become a high performer can be written as a linear equation. The values of the various predictors of high performer status (i.e., independent variables) are multiplied by "discriminant function coefficients" and these products are added together to obtain a predicted discriminant function score.

This score is used in the second step to predict the job candidates likelihood of becoming a high performer.

There are more complicated cases, in which the dependent variable has more than two categories. Discriminant analysis allows for such a case, as well as many more categories and this is where it scores over multivariate regression analysis. The interpretation, however, of the discriminant function scores and coefficients becomes more complex

Author: Aniruddha Dasgupta

Group: Finance 3

Discriminant Analysis

In today’s class we were introduced to the concept of Discriminant analysis. The main purpose of a discriminant function analysis is to predict group membership based on a linear combination of the interval variables.

In simple words, the analysis helps into finding which item or object belongs to a particular group or classification based on certain characteristics. It differs from group building techniques such as cluster analysis in that the classifications or groups to choose from must be known in advance.

To take an example, let us assume that we have data on 80 students in Business Analytics Class. We have data on number of students who want a job in Analytics and Data Modelling and number of students who want in Sales and Distribution. We need to predict group membership by looking at independent variables which may include: Students with engineering background, age, gender, number of work experience years.

The discriminant function analysis thus helps to predict group membership when only independent variables are known. It shows the relationship between the dependent variable (Students interested in analytics job and students interest in Sales job) and interval variables (Age, gender, number of work experience years, education background etc). The analysis shows that students with engineering background and students with more work experience wanted job in Analytics industry. Also, the students wanting job in sales showed a trend of students who did not have work experience, younger age (22-24 years) and were males.

Thus, this model helps to predict membership for group of students wanting job in Analytics and Sales based on observed variables.

In class, the concept was explained to us with the help of an example “Bank Loan” where the dependent variable was “previously defaulted” and independent variable was (age, level of education, years at current address etc). We did a regression analysis to find out whether an individual will default or not. A score was computed which was the base to our conclusion whether an individual will default or not.

Another scenario where Discriminant Analysis can be used is to find the variable which predicts the number of students using cell phones in class and students not using. The observed variables can b: Age, academic grades, member of committee, a persona having girlfriend/ boyfriend, etc. Thus, we can find characteristics essential to classify students into two groups: Students using cell phones and students not using cell phones. The relationship established can be used to predict the groups and have proper mix in all classes and avoid sending students of similar group in the same class.

This analysis can also be used to find: The number of customers going for insurance policy after meeting the agent of the company. The essential characteristics include: Gender, Age group, income level, education level, and awareness about insurance, etc. Thus a model can be established to understand the data set and identify characteristics affecting the decision making. Thus, the analysis can be used to assess training needs for agents meeting different class of customers to increase convertibility.

SAPNA BATRA

Finance - GROUP 1

Wilks lambda & Discriminant analysis and its Weakness

Today’s class was about discrimination analysis in SPSS. As per my understanding from class and some research on net, here are few things about discrimination analysis and Wilks lambda. Also I have mentions some of problems in Discriminant analysis.

To start with, discriminant analysis is done by creating a new variable that is a combination of the original predictors. This is done in such a way that the differences between the predeﬁned groups, with respect to the new variable, are maximised. Because there are two classes the maximum number of discriminant functions is one and each has an associated eigen value. Larger eigen values are associated with greater class separations and they can also be used to measure the relative importance of discriminant functions in multi-classanalyses. Eigen values can be converted into Wilk’s lambda, a multivariatetest statistic whose value ranges between zero and one. Values close to zero indicate that class means are different while values close to one indicate that the class means are not different. Wilk’s lambda will be one when the means are identical. Wilk’s lambda is equal to 1/(1+λ) for a two-class problem and can beconverted into a chi-square statistic so that a signiﬁcance test can be applied. Wilk’s lambda can be converted into a canonical correlation coefﬁcient fromthe square root of 1- Wilk’s lambda. The canonical correlation is the squareroot of the ratio of the between-groups sum of squares to the total sum of squares. Squared, it is the proportion of the total variability explained by differences between classes. Thus, if all of the variability in the predictors wasa consequence of the group differences the canonical correlation would beone, while if none of the variability was due to group differences thecanonical correlation would be zero. An assumption of a discriminant analysis is that there is no evidence of a difference between the covariance matrices of the classes. There are formalsigniﬁcance tests for this assumption (e.g. Box’s M) but they are not very robust.In particular they are generally thought to be too powerful, i.e. the nullhypothesis is rejected even when there are minor differences, and Box’s M isalso susceptible to deviations from multivariate normality (another assumption).

Each discriminant function can be summarised by three sets of coefﬁcients:

(1) standardised canonical discriminant function coefﬁcients (weights);

(2) corre-lations between discriminating variables and the standardised canonicaldiscriminant function; and

(3) unstandardised canonical discriminant functioncoefﬁcients (weights).

Weakness of Discriminant analysis

Inconsistent -. Because of the dissimilar processes involved in Fisher's discriminant analysis and Mahalanobis's discriminant analysis approaches to discriminant analysis, the resulting solutions are not alike.

Unintuitive- Due to the complexity of discriminant analysis's mechanics, it is an unwieldy tool for all those but the mathematically-savvy. Similar tools, such as multiple regression, are just as flexible as discriminant analysis without carrying along the intricacy and specificity associated with discriminant analysis.

Prediction- The prediction available through discriminant analysis is not true prediction. discriminant analysis can only tell a researcher the likely grouping of a certain data point; it cannot tell the researcher other properties of the data point or how likely it is the data point is a member of the classified group.

Inspite of all the weaknesses Discrimination analysis has wide rage of applications in all the fields.

Group- HR 1

Author- Ankita Kanojia

Discriminant Analysis: A different take!

So, we were learning Discriminant Analysis using SPSS in class today. Now, I wonder, what is the real use of Discriminant Analysis for a layman? For all we know, a layman may perhaps never have heard of a Discriminant Analysis. But, however, a lot of things that he sees around maybe influenced by Discriminant Analysis, and possibly we never pay heed to such a discriminating factor in our lives.

There have been several examples of Discriminant Analysis’ usage in sports, but this is a take which blew my mind:

(1979)... employed a statistical treatment called "discriminant analysis," which, based upon combinations of four variables (two for size, one for fat, one for strength), could place each player into his position with a very high degree of accuracy. This has important implications for selection of players when they make the transition from college to professional football. Finally, we again employed discriminant analysis within the positions, adding variables of muscle deficits, injury history, and playing time and were able to rank players into either the injured or noninjured categories. These equations had a sensitivity of 93.7 per cent and specificity of 96.1 per cent, with the overall injury rate of 38 per cent. This indicates that some characteristics may abet injury. The discriminant analysis allows for selection of variables to statistically profile the football player. This technique addresses the multiple factors that contribute to the success or injury of the player and should be of use in profiling any sport, especially on the professional level....

Talk about leaving fans’ hearts in their mouths based on Mathematics! Next time you go watch a game, you never know whether the selection has been Discriminant or not! If you feel your team is underplaying and is behaving strangely, don’t blame it on match fixing, blame it on Discriminant Analysis! However, on a serious note, the amount of study that the above gentlemen did, goes to show their determination of equalizing the 2 different entities, namely Sports and Mathematics. It was definitely a step forward!

Another Berkeley experiment research on “Predicting the Atlanta Falcons Play Calling using Discriminant Analysis” says:

“This study investigated the ability of discriminant analysis to predict the offensive play calling of the 2005 Atlanta Falcons. Data was collected on each of the 988 offensive plays run from scrimmage by the Atlanta Falcons during the 2005 NFL season. Independent variables included game location (home vs. away), down, yards to go, field position, score, offensive formation, opponent’s defensive rank against both the run and the pass, weather and field surface (turf vs. grass). The response variable was categorized into either a short pass (5 yards or less), medium pass (6 to 15 yards), long pass (more than 15 yards), run, or scramble (by Michael Vick).

A linear discriminant function was developed to predict play calling based on the independent variables. Based on a cross validation procedure, the model was able to correctly predict the play called 40.38 percent of the time. While this rate is not high, the model was able to predict each play with greater accuracy than the relative frequency that each play was run. Considering that the Falcons coaches said they only use frequencies, the use of discriminant analysis is an intriguing possibility for NFL coaches.”

The question is, how far can Mathematics and Statistical data go ahead in things like sports. Sports is usually a very emotion-centric way of life, and is prone to both human emotions and errors. What is right this season, can vary next season. If however, such models come to exist in everyday circumstances, they can surely put a gag on the thriving betting business!

Discriminant Analysis has been used far and wide, from measuring customer satisfaction of Airlines, Mobiles etc to Finding credit scores to analyzing players of Football. The question is, why should this knowledge be only kept to the seekers? If Airlines and Mobile Operators are doing such exercises, they should ensure that they are publishing it, so that the knowledge spreads. It Always Helps.

The Best application according to me, of Discriminant Analysis stands as thus. It is best used for Bankruptcy Calculation (at bankruptcy calculation)! This kind of information should go to the public, so that we precisely know when we are down under. See, my point stands vindicated. We really need this information to be spread in the public arena.

Sagnik Biswas

Marketing 2