In the encoder-decoder sequence to sequence model, why there has to be a eos token for the encoder input?
For decoder, sos token is important in the autoregressive formulation since you get a token and the context to predict the next token and eos is required to stop, but the encoder doesn’t require an eos right?
Just replying to give the question more visibility since I had the same question. I found this post online where it says:
[…] Terminating the input in an end-of-sequence (EOS) token signals to the encoder that when it receives that input, the output needs to be the finalized embedding. […]
But this doesn’t make sense to me. Once the last word has been processed, the resulting hidden state is the last one. I cannot see any need to "signal the encoder to finalize the embedding (i.e., hidden state).
Have you compared the results with and without adding EOS at the input sequences?