BA@SIBMB: An Insight Into Jaccard Measure

The Jaccard index, popularly known as the Jaccard similarity coefficient is a static used for comparing the similarity and diversity of sample sets. The Jaccard coefficient which measures similarity between sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:

J (A, B) = |A Ώ B|

| A U B|

The Jaccard distance, which measures dissimilarity between sample sets, is complementary to the Jaccard coefficient and is obtained by subtracting the Jaccard coefficient from 1, or, equivalently, by dividing the difference of the sizes of the union and the intersection of two sets by the size of the union:

J’ (A,B)=1- J(A.B) = | A U B|-|A Ώ B|

| A U B|

Jaccard’s coefficient (measure similarity) and Jaccard’s distance (measure dissimilarity) are measurement of asymmetric information on binary and non binary variables.

Example: Similarity of asymmetric binary attributes

Given two objects, A and B, each with n binary attributes, the Jaccard coefficient is a useful measure of the overlap that A and B share with their attributes. Each attribute of A and B can either be 0 or 1. The total number of each combination of attributes for both A and B are specified as follows:

M₁₁ represents the total number of attributes where A and B both have a value of 1.

M₀₁ represents the total number of attributes where the attribute of A is 0 and the attribute of B is 1.

M₁₀ represents the total number of attributes where the attribute of A is 1 and the attribute of B is 0.

M₀₀ represents the total number of attributes where A and B both have a value of 0.

Each attribute must fall into one of these four categories, meaning that

M₁₁+ M₀₁+ M₁₀ + M₀₀=n

Jaccard similarity coefficient J is given by

J= M₁₁

M₁₁+ M₀₁+ M₁₀

Jaccard distance J’ is given by

J’ = M₀₁+ M₁₀

M₁₁+ M₀₁+ M₁₀

Sakshi Goel

13100

Finance Group-1

BA@SIBMB

Monday, 29 August 2011

An Insight Into Jaccard Measure

No comments:

Post a Comment