Masked language modeling (MLM)

Masked language modeling (MLM) is a self-supervised learning task for training certain language models. It consists of predicting missing (masked) words in a sequence given their context. In BERT, it is executed as follows:

In 80% of examples, the focal word is replaced with the [MASK] token.
In 10% of examples, the focal word is replaced with a random token.
In 10% of examples, the focal word is not replaced.

MLM recalls an earlier, unordered context prediction task employed by word2vec called continuous bag-of-words. In both models, learning to predict a word from context forces an embedding model to discover the latent semantics of the expected word. As such, they can both be viewed as a form of de-noising.

David's raw ML reference notes

Explorer

Masked language modeling (MLM)

Graph View

Backlinks