Rather than discussing about the various methods taught in class, I thought today I’ll review some of the facts related to regression; which is the basic knowledge required to draw conclusions from the output of the “Discriminant Analysis”.
Regression analysis is used to produce an equation that will predict a dependent variable using one or more independent variables. This equation has the form
Y = b1X1 + b2X2 + ... + A
where Y is the dependent variable being predicted;
X1, X2 and so on are the independent variables being used to predict it;
b1, b2 and so on are the coefficients or multipliers that describe the size of the effect the independent variables are having on the dependent variable Y, and
A is the value Y is predicted to have when all the independent variables are equal to zero.
Suppose we have a regression equation for the dependent variable,
PRICE = -294.1955 (mpg) + 1767.292 (foreign) + 11905.42; telling us that price is predicted to increase by 1767.292 when the foreign variable goes up by one, decrease by 294.1955 when mpg goes up by one, and is predicted to be 11905.42 when both mpg and foreign are zero.
Coming up with a prediction equation like this is useful, only if the independent variables in the dataset have some correlation with the dependent variable. So in addition to the prediction components of our equation--the coefficients on our independent variables (betas) and the constant (alpha)--we need some measure to tell us how strongly each independent variable is associated with our dependent variable.
When running the regression model, we are trying to discover whether the coefficients on our independent variables are really different from 0 (so the independent variables are having a genuine effect on our dependent variable) or if alternatively any apparent differences from 0 are just due to random chance. The null (default) hypothesis is always that each independent variable is having absolutely no effect (has a coefficient of 0) and we are looking for a reason to reject this theory.
In simple or multiple linear regression, the size of the coefficient for each independent variable gives the magnitude of the effect that variable is having on the dependent variable, and the sign on the coefficient (positive or negative) gives the direction of the effect. In regression with a single independent variable, the coefficient tells how much the dependent variable is expected to increase (if the coefficient is positive) or decrease (if the coefficient is negative) when that independent variable increases by one. In regression with multiple independent variables, the coefficient tells how much the dependent variable is expected to increase when that independent variable increases by one, holding all the other independent variables constant. We would have to keep in mind the units which our variables are measured in.
The R-squared of the regression is the fraction of the variation in the dependent variable that is accounted for (or predicted by) the independent variables. (In regression with a single independent variable, it is the same as the square of the correlation between your dependent and independent variable.)
With this blog, I hope to have covered some of the intrinsic underlying facts behind the regression model and its analysis.
EMI JAVAHARILAL (13134)
Operations Group 2