In the seq2seq tutorial and many other NLP related tutorials of PyTorch, I see that people often use RNN in a loop and set sequence length to 1. I would like to ask if there is any different if RNN is run on the entire sequence without the loop. In addition, if RNN can be used that way, how is it different from RNNCell?
Many interesting RNN architectures use things besides what can be done with the “vanilla” RNN/GRU/LSTM - the classic is when one timestep’s output has an impact on the next timestep’s input - one prominent example is attention.
Once you have things like that, you cannot use the multi-timesteps anymore. I guess that people still like them, either because they’re used to them or because they still offer some speed advantage, but fundamentally, most uses are very similar to the *Cell variants.