Background

Consistent hashing is a partitioning strategy that involves placing hash codes on a number circle (“hash ring”) with fixed points, where is the number of partitions. To identify which partition a hash belongs to, you go clockwise until you find a fixed point.

Xu introduces consistent hashing as an unalloyed good, dedicating an entire chapter to it (ch. 5). On the other hand, Kleppman says that, in practice, it’s a lousy strategy for partitioning and is rarely used (p. 330), favoring either Dynamic partitioning or using a huge number of partitions.

It’s not clear to me why Kleppman considers these to be either/or; you still need a way to decide which of your (dynamic or virtual) partitions you’re going to put something in, and consistent hashing is one way to do that.

Implementation

To start, you need to identify the range of your hash function. For example, SHA-1’s range is . In the basic implementation, you divide this up into equal spaces. Then you just need to do a range lookup to determine to which partition a data point should go (or should be read from).

In the event that you need to add or remove a partition, you need to reassign hashes (on average), where is the number of records in your dataset. (Of course, you can still construct a pathological case where you end up needing to make reassignments because all of the data ended up in one partition.)

You can use consistent hashing with virtual partitioning by creating fixed points instead, and then assigning partitions to each node. You can then move virtual partitions around when a node is taken offline.

David's raw ML reference notes

Explorer

Consistent hashing

Background

Implementation

Graph View

Table of Contents

Backlinks