Monday, 29 August 2011

The Chi Square

The chi-square test is a statistical test used to examine differences with categorical variables. The chi-square test is used in two similar but distinct circumstances:

  • For estimating whether two random variables are independent
  • For estimating how closely an observed distribution matches an expected distribution (a "goodness-of-fit" test)

The Two Types of Analysis

Test for Association: Test for Association is a (non-parametric, therefore can be used for nominal data) test of statistical significance widely used bivariate tabular association analysis. Typically, the hypothesis is whether or not two different populations are different enough in some characteristic or aspect of their behavior based on two random samples. This test procedure is also known as the Pearson chi-square test.

It is to be understood that the chi-square test is needed when the data are categorical (or nominal) in nature. This measure is based on the fact that we can compute the expected frequencies in a two-way table (i.e., frequencies that we would expect if there was no relationship between the variables). For example, suppose we ask 20 males and 20 females to choose between two brands of soda pop (brands A and B). If there is no relationship between preference and gender, then we would expect about an equal number of choices of brand A and brand B for each sex. The Chi-square test becomes increasingly significant as the numbers deviate further from this expected pattern; that is, the more this pattern of choices for males and females differs.

Goodness of Fit Test: Goodness-of-fit Test is used to test if an observed distribution conforms to any particular distribution. Calculation of this goodness of fit test is by comparison of observed data with data expected based on the particular distribution.

Requirements of Chi-Square Tests

  • Data is typically attribute-based (discrete). At the very least, all data must be able to be categorized as being in some category or another)
  • Expected cell counts should not be low (definitely not less than 1 and preferable not less than 5) as this could lead to a false positive indication that there is a difference when, in fact, none exists
  • The only assumption underlying the use of the Chi-square (other than random selection of the sample) is that the expected frequencies are not very small. The reason for this is that, actually, the Chi-square inherently tests the underlying probabilities in each cell; and when the expected cell frequencies fall, for example, below 5, those probabilities cannot be estimated with sufficient precision.

Applications of Chi-Square Test

Chi-squared test of independence is a very useful tool for any predictive analytics professional. When analyzing marketing research results, the chi-square statistical test comes in most handy when analyzing cross tabulations of the survey data. Since crosstabs show the frequency and percentage of responses to questions by different categories of respondents (gender, income, profession, etc.), the chi-square test can tell us whether there is a statistical difference between the categories in how they answered the question.

  • To verify the influence of gender on purchase decisions. For example, are men the primary decision makers when it comes to purchasing a big ticket items? Is gender a factor in color preference of a car?
  • To test if altering the product mix (% of upscale, mid-range and volume items, say) has impacted profits by comparing sales revenues of each product type before and after the change in product mix
  • To determine if certain types of products sell better in certain geographic locations than others. For example, the type of shoes sold in winter depends strongly on whether a retail outlet is located in the north versus in the south


Group No.: Group 4 – Marketing

Author: Kanishka Pasari

No comments:

Post a Comment