Siamese, triplet, and two-tower networks are closely related neural network architectures. They are all used for constrastive learning and learning-to-rank. Siamese and two-tower models in particular can sometimes be viewed as interchangeable, though this is not strictly accurate.

A two-tower network is actually two separate neural networks that are trained together through the use of a contrastive loss function, such as triplet loss or the original (pairwise) contrastive loss. These two networks typically have a similar architecture, but there are no guarantees about this. Such an architecture is often used to align embeddings of two different domains into the same latent space.

A Siamese network conceptually consists of two identically structured networks with shared weights, potentially after an initial linear layer to bring them into matching dimensionality. The two networks learn aligned embeddings via the original contrastive loss. In the event that the networks are comparing entities from the same domain, the networks can be completely identical from start to end. In practice, then, one usually implements this architecture as a single network through which two forward passes are made for each training example pair.

A triplet network is structured like a Siamese network, but is trained slightly differently. In this case, an “anchor” example is compared with both a “positive” example and a “negative” example; the triplet loss then trains the model to separate the positive from the negative examples by a certain margin.

The two-tower architecture can be seen as a generalization of both the Siamese and triplet architectures: it relaxes the weight sharing constraint and permits the use of any contrastive loss function.