The Jaccard index, popularly known as the Jaccard similarity coefficient is a static used for comparing the similarity and diversity of sample sets. The Jaccard coefficient which measures similarity between sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:

J (A, B) = __|A Ώ B|__

| A U B|

The Jaccard distance, which measures dissimilarity between sample sets, is complementary to the Jaccard coefficient and is obtained by subtracting the Jaccard coefficient from 1, or, equivalently, by dividing the difference of the sizes of the union and the intersection of two sets by the size of the union:

**J’ (A,B)=1- J(A.B) = | A U B|-|A Ώ B|**

** | A U B|**

Jaccard’s coefficient (measure similarity) and Jaccard’s distance (measure dissimilarity) are measurement of asymmetric information on binary and non binary variables.

__Example: Similarity of asymmetric binary attributes __

Given two objects, *A* and *B*, each with *n binary *attributes, the Jaccard coefficient is a useful measure of the overlap that *A* and *B* share with their attributes. Each attribute of *A* and *B* can either be 0 or 1. The total number of each combination of attributes for both *A* and *B* are specified as follows:

*M*_{11} represents the total number of attributes where *A* and *B* both have a value of 1.

*M*_{01} represents the total number of attributes where the attribute of *A* is 0 and the attribute of *B* is 1.

*M*_{10} represents the total number of attributes where the attribute of *A* is 1 and the attribute of *B* is 0.

*M*_{00} represents the total number of attributes where *A* and *B* both have a value of 0.

Each attribute must fall into one of these four categories, meaning that

M_{11 }+ M_{ 01}+ M_{10} + M_{00 }=n

Jaccard similarity coefficient J is given by

J= __ M _{11 } __

_{}M_{11 }+ M_{ 01}+ M_{10}

Jaccard distance J’ is given by

J’ = __M _{ 01}+ M_{10}__

_{}

_{ }M_{11 }+ M_{ 01}+ M_{10}

_{ }

Sakshi Goel

13100

Finance Group-1

## No comments:

## Post a Comment