Most vector-valued variables, and some scalar-valued variables, exhibit some degree of “clumpiness”: data points often fall near one another, with relatively few in the regions between natural groupings. K-means clustering can reveal such structure, and when it is present, it can be used as a form of quantization. In this scheme, each centroid corresponds to a level.

For high-dimensional data, the distance between points becomes more uniform. To solve this problem, high-dimensional vectors are typically split into smaller sub-vectors before using k-means for quantization. This procedure is called product quantization.