Similarity and Dissimilarity. Similarity is a numerical measure of how alike two data objects are, and dissimilarity is a numerical measure of how different two data objects are. Similarity measures how close two distributions are. Similarities/dissimilarities is fundamental to data mining; almost everything else is based on measuring distance.
Similarity measure is a numerical measure of how alike two data objects are. Cosine Similarity. The cosine similarity is a measure of the angle between two vectors, normalized by magnitude. You just divide the dot product by the magnitude of the two vectors. Similarity measures provide the framework on which many data mining decisions are based. A small distance indicating a high degree of similarity and a large distance indicating a low degree of similarity.

Measuring similarity or distance between two entities is a key step for several data mining and knowledge discovery tasks. Similarity in a data mining context is usually described as a distance with dimensions representing features of the objects. Common data mining task is the estimation of similarity among objects.