Basic idea
Cho, et al. (2014) introduced the Encoder-Decoder Model, an autoregressive neural network with three layers: an encoder (whose hidden states are
During training, a
The encoder and the decoder are both recurrent; i.e., they take both an exogenous input and the preceding hidden state as inputs to the layer. The external inputs to the encoder are the
The initial external input the the decoder is a special “start-of-sequence” token and a
The decoder keeps on generating states until one of them outputs the a special end-of-sequence token, at which point the process terminates.
Implications
The entire input sequence is compressed into a single hidden state at the time that the decoder begins generating tokens. As a result, it’s not going to do a great job with recognizing relationships between tokens. Bahdanau, Cho, and Bengio (2014) introduced attention to solve this problem specifically.