Random partitioning of vectors

As with any distributed system, it may be beneficial to partition vectors in a collection of vectors into random groups. Doing so ignores any distributional structure of the collection. This, in turn, precludes many opportunities to accelerate approximate search without substantial impact on precision and recall.

However, random partitioning is a fully data-independent strategy, which can be useful for applications involving frequent real-time changes. In these cases, load balance can be maintained through, e.g., consistent hashing or virtual partitions.

When workloads require frequent changes, but latency is an issue, it can be beneficial to explore periodic rebuilds of the index, or hybrid storage systems involving exact search of un-indexed candidates.

David's raw ML reference notes

Explorer

Random partitioning of vectors

Graph View

Backlinks