Saturday, 3 September 2011

Discriminant analysis

Discriminant analysis

Discriminant analysis is a statistical technique to classify objects into mutually exclusive and exhaustive groups based on a set of measurable object's features. Term discriminant analysis comes with many different names for difference field of study. It is also often called pattern recognition, supervised learning, or supervised classification. This tutorial gives overview about Linear Discriminant Analysis (LDA). If the number of classes is more than two, it is also sometimes called Multiple Discriminant.


The purpose of Discriminant Analysis is to classify objects (people, customers, things, etc.) into one of two or more groups based on a set of features that describe the objects (e.g. gender, age, income, weight, preference score, etc. ). In general, we assign an object to one of a number of predetermined groups based on observations made on the object.

The groups are known or predetermined and do not have order (i.e. nominal scale). The classification problem gives several objects with a set features measured from those objects. What we are looking for is two things:

1. Which set of features can best determine group membership of the object?

2. What is the classification rule or model to best separate those groups?

Difference between clustering and discriminant analysis.


Cluster Analysis

Discriminant Analysis

Other name

Unsupervised learning

Supervised learning

Training or learning period

Object category is unknown

Object category is known

Purposes of training

To know category of each object

To know the classification rule

After training (usage)

To classify object into a number of category

To classify object into a number of category

In clustering, the category of the object is unknown. However, we know the rule to classify (usually based on distance) and we also know the features (independent variables) that can describe the classification of the object. There is no training example to examine whether the classification is correct or not. Thus, the objects are assigned into groups merely based on the given rule.

In discriminant analysis, object groups and several training examples of objects that have been grouped are known. The model of classification is also given (for example, linear or quadratic) and we want to know the best fit parameters of the model that can best separate the objects based on the training samples.

The differences between clustering and discriminant analysis are only on the training session. After the parameters are determined, and we start to use the model, both models have the same usage to classify object into a number of category.

Discriminant analysis has been successfully used for many applications. As long as we can transform the problem into a classification problem, we may apply the technique. You can use Discriminant analysis for original applications if you have new additional combination of features and objects that may never been considered by other people before. Here are a few fields and examples:

Identification To identify type of customers that is likely to buy certain product in a store. Using simple questionnaires survey, we can get the features of customers. Discriminant analysis will help us to select which features can describe the group membership of buy or not buy the product.

Decision Making Doctor diagnosing illness may be seen as which disease the patient has. However, we can transform this problem into classification problem by assigning the patient to a number of possible groups of disease based on the observation on the symptoms.

Prediction Question "Will it rain today" can be thought as prediction. Prediction problem can be thought as assigning "today" to one of the two possible groups of rain and dry.

Pattern recognition To distinguish pedestrians from dogs and cars on captured image sequence of traffic data is a classification problem.

Learning Scientists want to teach robot to learn to talk can be seen as classification problem. It assigns frequency, pitch, tune, and many other measurements of sound into many groups of words.

Bubble chart

A Bubble chart is a variation of a Scatter chart in which the data points are replaced with bubbles. A Bubble chart can be used instead of a Scatter chart if your data has three data series, each of which contains a set of values. For example, the worksheet in the following picture contains values for three types of data: number of products, dollar value of sales, and percentage size of market share.

When to use a Bubble chart

Bubble charts are often used to present financial data. Use a Bubble chart when you want specific values to be more visually represented in your chart by different bubble sizes. Bubble charts are useful when your worksheet has any of the following types of data:

Three values per data point Three values are required for each bubble. These values can be in rows or columns on the worksheet, but they must be in the following order: x value, y value, and then size value.

Negative values Bubble sizes can represent negative values, although negative bubbles do not display in the chart by default. You can choose to display them by formatting that data series. When they are displayed, bubbles with negative values are colored white (which cannot be modified) and the size is based on their absolute value. Even though the size of negative bubbles is based on a positive value, their data labels will show the true negative value.

Multiple data series Plotting multiple data series in a Bubble chart (multiple bubble series) is similar to plotting multiple data series in a Scatter chart (multiple scatter series). While Scatter charts use a single set of x values and multiple sets of y values, Bubble charts use a single set of x values and multiple sets of both y values and size values.

Author :- Saeesh Dhond

Group :- Operation 3

No comments:

Post a Comment