Monday, 29 August 2011

Business analytics - Session 2

Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. Clustering is a method of unsupervised learning, and a common technique for statistical data analysis used in many fields, including machine learning, data mining, pattern recognition, image analysis, information retrieval, and bioinformatic.
The k-means algorithm assigns each point to the cluster whose centre (also called centroid) is nearest. The centre is the average of all the points in the cluster — that is, its coordinates are the arithmetic mean for each dimension separately over all the points in the cluster.
In k-means clustering methods, it is often requires several analysis before the number of clusters can be determined.

