Saturday, 3 September 2011

Figure 1

I would like to explain what a “BUBBLE CHART” is, how it can be used and interpreted, what the disadvantages of a bubble chart are and how it can help in carrying out further analyses; all this through an example shared right here.

Bubble charts are popular tools for identifying and illustrating industry clusters. Essentially, these charts allow 4 different variables to be plotted within the same graph, making it easy to assess relative economic performance. Because they allow visual comparisons of well-understood measures, bubble charts are often used for pinpointing priority industries that should receive attention from a state economic development agency.

Bubble charts: what they are

Figure 1 illustrates industry cluster relationships for the 17 Pennsylvania targeted industry clusters (CWIA 2004).

The following four variables are plotted in this single graphic:

1. Average cluster wages in 2002: on the x-axis (horizontal)

2. Growth in jobs, 1998 to 2002; on the y-axis (vertical)

3. Employment size of the industry, 2002; indicated by the size of the bubble

4. The industry’s location quotient, 2002; indicated by the color of the bubble

With user-defined demarcations, location quotients show whether a state or region is more

specialized (>1.1149), less specialized (<0.95) or as specialized in a particular industry as is the nation or the reference region.

In this graphic,

clusters in which the state is more specialized than the nation - shown in red

clusters with less specialization - shown in green

clusters with average specialization - shown in blue.

Bubble charts: how they are used

Bubble charts show the most important clusters in a state or region as measured by-

total employment size (the bigger the bubble the better),

recent job growth (the further up in the graph the better), and

high-paying jobs (the further to the right in the graph, the better).

Depending on the state’s economic development objective – that is, whether the goal is to create more jobs or better-paying jobs, or both – the state agency responsible for economic development might choose to concentrate on industries with large bubbles or industries located in the right-hand side of the graphic. To many, the ideal is to focus on rapidly-growing, high-paying industries depicted in the upper right-hand corner of the graph.

For example, the biomedical industry was promising because it paid relatively high wages, had shown substantial growth in the last 5 years, and had a red location quotient (indicating that the state had some locational advantage in the industry). However, it is also had a relatively small industry in terms of employment size, in contrast to, for example, Life Sciences and Health Care.

While bubble charts can help identify “promising” clusters, an important shortcoming of this analysis is that they can’t identify “why” a particular region has an advantage. For example, is a cluster strong in a region because of access to resources or markets? Or, does the region possess a particularly skilled labor force?

Because bubble chart analysis can’t answer the “why” and “how” questions, it should be seen as an important part of economic development analyses, but it is only one part of the process.

Nonetheless, bubble charts are a good starting place for any discussion about cluster-based economic development policies. For example, in their cluster work with communities, the first question after identifying an important cluster was usually some variant of “why here?” That opened the door for rich conversations about the potential causes of local competitive advantage. And lessons can also be learnt from declining clusters: the 2nd question they usually got was “what’s going on?” as audiences look at historically important clusters that seem to be in decline.

For example, the decline in printing employment (high LQ) tends to raise questions about off-shoring and the supposed transition to a paperless, digital economy.

On that note, I would like to conclude my blog by adding that bubble charts provide not only a method for identifying clusters, but also an entrée for discussing more advanced topics, both theoretical (why?) and practical (how?).

Posted by-


Operations Group 2

Microsoft Excel – Graphically, Not the Best Software!!!

On the evening of the 1st of September, I was told that SPSS is not the right software for making graphs and charts. I went through the limited functionalities it offers and realized that it was good, but not as good as Microsoft Excel. Sure enough, inputting the same data on Excel and generating the charts showed this difference. The charts turned out to be more vivid and colourful and were a treat to the eye. Even the bubble graphs looked quite realistic. To add to that, the ease of making charts in Excel made it a hands-on winner over SPSS, if one only looks at graphical data presentation. Then I wondered, is there a software that could do the job even BETTER? Or, is there an alternative to SPSS that could make attractive charts and graphs as well? The answer – A RESOUNDING YES!!!

But before that, why is graphical presentation important? Graphs and charts give life to the data at hand. Data could look quite boring and dull when presented in the form of tables. A part of the data could, at times, be very difficult to understand as well. However, the main use of graphical presentation of data is to illustrate comparison. Tables, in such cases, tend to show the comparison in absolute terms. What I mean is that it shows the different figures but the extent of variations cannot be fully understood. Charts present a more relative picture. The sudden drop in the size of the bar or the sudden fall of a line signifies a much larger picture that could just not be depicted by using numbers in the form of a table.

Bars, charts and pie diagrams are a passé. There are various new types of charts available. Excel has a few of those. However, there are softwares which have as many, if not more. Plus, they are visually appealing as well. Some of them are described below.


Going by the descriptions on the product’s website, SigmaPlot could be a very good replacement for SPSS. However, it may not be generalized as I have neither used it nor am I an expert at SPSS to say so. This comes only from a novice’s point of view. SigmaPlot allows you to do a whole lot of tests with your data (ranging from a simple Chi-Square to complicated ones like Fisher Exact Test, McNemars's Test, Relative Risk and Odds Ratio). It also allows you to perform correlation and regression and has measures to check for survival, like Kaplan-Meier’s and Cox Regression. At the same time, the data can be transformed just like it can be in SPSS.

However, the type and quality of charts is much better than in SPSS. It features bar graphs, line graphs, pie charts, histograms, time series plots, box plots, scatter diagrams, bubbles, needles, contours and waterfalls. There are various additional features like 3D rotation, reference lines, error bars, function plotter and multiline text editor. One can also export the charts to various formats or even save as a web page.

Some of the screenshots of the charts are given below:


If one is looking at an application only for chart-making and not for advanced statistics, FusionCharts may be a good software to look at. FusionCharts is the product of InfoSoft Global (P) Ltd., an ISV headquartered in Kolkata, India. InfoSoft Global is the licensor of the FusionCharts Suite and all other data visualization solutions that are distributed under the FusionCharts brand.

FusionCharts is a flash charting component that can be used to render data-driven & animated charts for your web applications and presentations. FusionCharts is created in Adobe Flash 8 and can be used with any web scripting language like HTML, .NET, ASP, JSP, PHP, ColdFusion, Ruby on Rails etc., to deliver interactive and powerful flash charts. describes it as, “A set of Macromedia Flash (SWF) files that helps you to create eye-catching animated graphs.”

FusionCharts is a comprehensive charting suite with over 75 chart types and 500 maps which can be rendered both in 2D and 3D. Also, it renders stunning charts with animation and interactivity that cannot be matched with most server-side charting components.

As mentioned earlier, the charts rendered are highly interactive with drill-down to unlimited levels, tooltips, chart export and visual editing. Furthermore, the charts automatically find out the best position for numbers, captions and labels even when they are very short, very long or there are too many of them. The charts can also be exported to JPEG, PNG and PDF formats. Huge quantities of data can be presented using a zoom chart which has a macroscopic view and can be zoomed in. In addition to all the general purpose charts, FusionCharts offers advanced charts like Combination, Scroll, Zoom Line, XY Plot, Marimekko and Pareto charts. Even the analysis is much easier using highlights, colour ranges and trend lines.

A screenshot of the FusionCharts as a Web Application is given below:


Another software that one can lay their hands on is Charts & Graphs by SummitSoft. Charts & Graphs has a wide variety of styles and powerful editing options to quickly take your charts to the next level. The program allows you to create and edit stunning 3-D charts and graphs without having to switch back and forth from your data files. It’s as easy as 1-2-3:

  1. Select a chart or graph style
  2. Enter or import your data, adjust colours and effects
  3. Use in your document or export as an image, PDF and other formats

Although Microsoft Office needs to be present in your system to run the software, Charts & Graphs will take your presentations, reports and documents to a new level. Charts & Graphs allows you to instantly rotate any 3-D chart in real-time for perfect presentation. You can also explode (separate) all or individual parts of a chart for greater impact. Multi-layered data charts can also be prepared using this software. Since the data resides with the chart or graph, one does not need to move between applications to modify – it can be done directly from this software.

Charts & Graphs allows you to create stunning charts, one of which is illustrated below:

Data presentation is very important for organizations today as one is moving away from heavily-loaded reports to more lucid presentations. It therefore becomes critical to attract attention using visually appealing and interactive charts for a much greater impact. Although Microsoft Excel has managed to do it quite well, there are still a few gaps in this area which have been filled by other softwares. It is time that one looks beyond Excel for these needs. Will Excel be able to do it as well as the other softwares in the days to come? Only time will tell.


Author: Kanishka Pasari

Group: Marketing – Group 4

Discriminant analysis application for a mutual fund product

If I were to sell somebody a financial product how hard would it be? Well, very hard. Believing in a financial product is very difficult. This was the agony of a newly established mutual fund which came up with a mutual fund product. It wanted to know whether the product will be well accepted by the customers. Therefore, the company did a discriminate analysis. It selected a sample of people to whom certain questions were asked. As the question were posed for this purpose only, the questions were highly relevant. The questions asked which were also independent variables were:

a). The information about such product

b). Income effect on the investment decision

c). Number of dependent members in a family

d). The level of exposure to other financial products

The grouping variable was the level of acceptance of the mutual fund product in which highly acceptable was given a range of 1 and not acceptable was given a range of 2.

The result was good in terms of accuracy as the accuracy came out to be 79% and anything above 50% is considered to be a good accuracy.

The unstandardized table tells about how much a particular variable has had influence on the dependent variable. In this case, it was the acceptability of the mutual fund product. The company found out that the information about the mutual fund product has had the maximum influence. The Wilk’s Lambda value was quite high at about 0.92 and the Eigen Value at 0.08 which is a sign of good grouping of variables.

The end result found out from this exercise was that insufficient information about these products to the people is the major reason behind the unacceptability of such products.

The company did a first level analysis by checking the respondent’s frequency to different responses and thus checking the reliability of the data collected and analysed.

The company had to make a strategic decision as to how to increase the awareness about the product. It came out with different forms of print and electronic advertisements to create sufficient information about the product so that people find it acceptable. The company promoted its disclosure norms and so it created loyalty in the people for its product. The product sales are definitely high but the full blown proportion of sales is yet to be seen with the information flow still expanding.

Name: Sakshi Tripathi

Specialization: Finance

Group: 6

Predicting Dividend payments using Radar Graphs

We will be mapping different variables which affect the dividend policy of a company. The dependent variable (or grouping variable would be the dividend paid in last 3 years- Yes or No). The independent variables are factors like-

1. Earnings Stability

2. Funds Liquidity

3. Past Dividend Rates

4. Ability to Borrow

5. Impact of Govt Policies

All of these factors are rated on a scale of 5. Discriminant Analysis is run across these variables to check the variability between the groups and across the groups (Eigen Values and Wilk’s Lambda). Also, we can see the different correlations which affect the Divident policy of the firms. We can make tables to check the data on 4 such companies from BSE/NSE.

This can be represented in the form of a Radar Graph. Radar graphs are similar to line graphs, except that they use a radial grid to display data items. A radial grid displays scale value grid lines circling around a central point, which represents zero. Higher data values are farther from the center point.

This Graph shows that the company 1 has got highest Earning stability and has the highest Past dividend rates. Thus by extending the logic, we can predict from these graphs the kind of companies which are likely to pay high dividends. Thus radar graphs are very simple to interpret and a powerful measure of representing the data.

Vyom Saini (13114)

Fin Grp-6

Discriminant analysis

Discriminant analysis

Discriminant analysis is a statistical technique to classify objects into mutually exclusive and exhaustive groups based on a set of measurable object's features. Term discriminant analysis comes with many different names for difference field of study. It is also often called pattern recognition, supervised learning, or supervised classification. This tutorial gives overview about Linear Discriminant Analysis (LDA). If the number of classes is more than two, it is also sometimes called Multiple Discriminant.


The purpose of Discriminant Analysis is to classify objects (people, customers, things, etc.) into one of two or more groups based on a set of features that describe the objects (e.g. gender, age, income, weight, preference score, etc. ). In general, we assign an object to one of a number of predetermined groups based on observations made on the object.

The groups are known or predetermined and do not have order (i.e. nominal scale). The classification problem gives several objects with a set features measured from those objects. What we are looking for is two things:

1. Which set of features can best determine group membership of the object?

2. What is the classification rule or model to best separate those groups?

Difference between clustering and discriminant analysis.


Cluster Analysis

Discriminant Analysis

Other name

Unsupervised learning

Supervised learning

Training or learning period

Object category is unknown

Object category is known

Purposes of training

To know category of each object

To know the classification rule

After training (usage)

To classify object into a number of category

To classify object into a number of category

In clustering, the category of the object is unknown. However, we know the rule to classify (usually based on distance) and we also know the features (independent variables) that can describe the classification of the object. There is no training example to examine whether the classification is correct or not. Thus, the objects are assigned into groups merely based on the given rule.

In discriminant analysis, object groups and several training examples of objects that have been grouped are known. The model of classification is also given (for example, linear or quadratic) and we want to know the best fit parameters of the model that can best separate the objects based on the training samples.

The differences between clustering and discriminant analysis are only on the training session. After the parameters are determined, and we start to use the model, both models have the same usage to classify object into a number of category.

Discriminant analysis has been successfully used for many applications. As long as we can transform the problem into a classification problem, we may apply the technique. You can use Discriminant analysis for original applications if you have new additional combination of features and objects that may never been considered by other people before. Here are a few fields and examples:

Identification To identify type of customers that is likely to buy certain product in a store. Using simple questionnaires survey, we can get the features of customers. Discriminant analysis will help us to select which features can describe the group membership of buy or not buy the product.

Decision Making Doctor diagnosing illness may be seen as which disease the patient has. However, we can transform this problem into classification problem by assigning the patient to a number of possible groups of disease based on the observation on the symptoms.

Prediction Question "Will it rain today" can be thought as prediction. Prediction problem can be thought as assigning "today" to one of the two possible groups of rain and dry.

Pattern recognition To distinguish pedestrians from dogs and cars on captured image sequence of traffic data is a classification problem.

Learning Scientists want to teach robot to learn to talk can be seen as classification problem. It assigns frequency, pitch, tune, and many other measurements of sound into many groups of words.

Bubble chart

A Bubble chart is a variation of a Scatter chart in which the data points are replaced with bubbles. A Bubble chart can be used instead of a Scatter chart if your data has three data series, each of which contains a set of values. For example, the worksheet in the following picture contains values for three types of data: number of products, dollar value of sales, and percentage size of market share.

When to use a Bubble chart

Bubble charts are often used to present financial data. Use a Bubble chart when you want specific values to be more visually represented in your chart by different bubble sizes. Bubble charts are useful when your worksheet has any of the following types of data:

Three values per data point Three values are required for each bubble. These values can be in rows or columns on the worksheet, but they must be in the following order: x value, y value, and then size value.

Negative values Bubble sizes can represent negative values, although negative bubbles do not display in the chart by default. You can choose to display them by formatting that data series. When they are displayed, bubbles with negative values are colored white (which cannot be modified) and the size is based on their absolute value. Even though the size of negative bubbles is based on a positive value, their data labels will show the true negative value.

Multiple data series Plotting multiple data series in a Bubble chart (multiple bubble series) is similar to plotting multiple data series in a Scatter chart (multiple scatter series). While Scatter charts use a single set of x values and multiple sets of y values, Bubble charts use a single set of x values and multiple sets of both y values and size values.

Author :- Saeesh Dhond

Group :- Operation 3

Linear discriminant analysis, OLAP cubes and Bubble charts

Linear discriminant analysis

Linear discriminant analysis (LDA) method is used in statistics, pattern recognition and machine learning to find a linear combination of features which characterize or separate two or more classes of objects or events.


· Bankruptcy predictionLDA was used to determine which firms went into bankruptcy and those that survived based on predictions made using financial ratios.

· Face recognition In computerised face recognition, each face is represented by a large number of pixel values. Linear discriminant analysis is primarily used here to reduce the number of features to a more manageable number before classification.

· MarketingIn marketing, discriminant analysis was once often used to determine the factors which distinguish different types of customers and/or products on the basis of surveys or other forms of collected data.Logistic regression or other methods are now more commonly used.

Olap Cube

An OLAP cube (for online analytical processing) is a data structure that allows fast analysis of data. It can also be defined as the capability of manipulating and analyzing data from multiple perspectives. The arrangement of data into cubes overcomes some limitations of relational databases.

There are three reasons for adding a cube to your solution:

· Performance A cube’s structure and pre-aggregation allows it to provide very fast responses to queries that would have required reading, grouping and summarizing millions of rows of relational data.

· Drill down functionality – Many reporting software tools will automatically allow drilling up and down on dimensions with the data source is an OLAP cube. Some tools, like IBM Cognos’ Dimensionally Modeled Relational model will allow you to use their product on a relational source and drill down as if it were OLAP but you would not have the performance gains you would enjoy from a cube.

· Availability of software tools Some client software reporting tools will only use an OLAP data source for reporting. These tools are designed for multi-dimensional analysis and use MDX behind the scenes to query the data.

Bubble charts

A bubble chart is a type of chart where each plotted entity is defined in terms of three distinct numeric parameters. Bubble charts can facilitate the understanding of the social, economical, medical, and other scientific relationships.

Use of bubble charts

· The single best example of good use of bubbles is a scatterplot demonstrating some relationship between the x and y variables. The bubbles are added to demonstrate some size metric to help show whether the relationship holds as well for big or small entities of analysis.

· Another good use of bubbles is to help prioritize decisions coming from diagnostic-type analyses, for example, if we have a growth vs. Profitability matrix.

Group – Marketing 3

Author of the Article – Rajiv Venugopal

Descriptive Discriminant Analysis

If you have two or more groups of subjects and several variables about each subject and you want to determine how the groups differ on the variables, you will use descriptive discriminant analysis. Descriptive discriminant analysis shows which variables are best at distinguishing one group from the other.

Function: Descriptive discriminant analysis allows you to describe two or more groups of subjects (e.g. people) in terms of the variables that you have available and in ways that make the differences between the groups as large as possible. It uses information on the means and standard deviations of the variables to create weighted combinations of variables that distinguish the groups.

Types: The two broad types of discriminant analysis are parametric and nonparametric. Parametric discriminant analysis assumes the distribution of each group is multivariate normal. Nonparametric discriminant analysis relaxes this assumption, at some cost in power.

Types of Parametric Discriminant Analysis: The most common type of parametric discriminant analysis is Fisher's linear discriminant analysis, which creates linear combinations of the variables. That is the value of each variable is multiplied by a constant, and then these products are added together to create a discriminant score. An alternative is quadratic discriminant analysis, which adds quadratic terms.

Types of Nonparametric Discriminant Analysis: Two common types of nonparametric discriminant analysis are kernel and k-nearest neighbor. Kernel discriminant analysis estimates the distribution of variables in each group using one of a variety of complex functions known as kernel density estimates. These are needed because when the distribution of variables is not normal, the mean and standard deviation are not enough to describe the distribution.

K-nearest neighbor methods first define "nearness" and then attempt to find groups of subjects that are as near as possible to each other.

Author Name: Vijeta Bharadwaj
Group : Finance 2