Agenda

Cluster analysis

Unsupervised classification

Similarity and dissimilarity measure

In what sense are points close in one cluster and far from points in another cluster? How do we measure that?

Similarity takes a large value when points are close. Dissimilarity takes a large value when points are far apart. This reflects the distance between observations. Any monotone-decreasing function can convert similarities to dissimilarities. Both similarity and dissimilarity measures can be subjective. For example comparing the taste of three ice creams.

Clustering as an optimization problem

image.png