Sentence BERT (SBERT, S-BERT)

SBERT (“Sentence BERT”) is a variant of BERT that has been fine-tune to generate “sentence” (document) embeddings. It is optimized for tasks involving the direct comparison of documents, such as for retrieval.

The original BERT does provide a single embedding, the [[Classification token (CLS)|the [CLS] embedding]], that represents the entire sequence; however, it also produces an embedding for each other token in the sequence. When comparing sentences, preceding approaches generally fell into two categories:

The fast way is to somehow combine the token embeddings into a pooled embedding, and then compare these pooled embeddings. This is fast, but (relatively) low-fidelity. Methods include:
- Mean token pooling (used in the SBERT paper)
- Using just the [CLS] token
The slow way is to perform a full token-wise comparison using cross-attention. BERT provides a way to do this directly by concatenating the two, separated by the [SEP] token; it is also possible to obtain separate embeddings, concatenate them, and pass them to a task head.

As noted, SBERT essentially fine-tunes BERT to be better at “the fast way.” To accomplish this, they fine-tuned BERT to classify pairs of sentences as entailment, contradiction, or neither (neutral). The model was trained on the SNLI dataset.

Sentence BERT can be implemented as a Siamese or triplet network, though in practice this essentially boils down to making multiple passes through a single network in parallel.

David's raw ML reference notes

Explorer

Sentence BERT (SBERT, S-BERT)

Graph View

Backlinks