THIS ENTRY IS GARBAGE

~~When neural networks first caught on, high-dimensional data was compressed using random projection. However, in recent years, it has become increasingly common to train an embedding matrix as you would any other set of weights.

~~The first such model was word2vec (2013), which is a highly specialized and nonlinear model. This was followed up with various other specialized “-2vec” models, like node2vec, paragraph2vec, etc.

~~Not long after, Ilya Sutskever introduced the idea of using a simple linear layer to learn an embedding matrix. A linear embedding matrix is quite computationally efficient, which is important for big models that have to embed every token in a long sequence.