Data-dependent vs data-independent partitioning schemes

There’s no reason why you couldn’t just try to uniformly break down the data in your vector database as a form of load balancing. But usually you can get much better performance by taking the underlying distribution into account. The problem is that, if the distribution changes, then your indexing scheme could hurt accuracy, performance, or both.

David's raw ML reference notes

Explorer

Data-dependent vs data-independent partitioning schemes

Graph View

Backlinks