tl;dr if you generate a random matrix (with and ) that projects high dimensional data to low dimensions, you actually end up magically preserving distance pretty well. It doesn’t actually need to be Gaussian.

Claude AI explained it as follows:

The JLL states that for any , and for any set of points in a high-dimensional space, there exists a linear map (random projection) into a lower-dimensional space of dimension , such that the pairwise distances between the points are preserved within a factor of with high probability.

More formally:

,

where is a tuning parameter that lets you dial in compression factor vs lossiness. The lemma was introduced in 1984.