Hi, I’ve just begun using Pytorch and was going through the RNN example in https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial. As per my understanding, the current time step’s output is predicted using the current hidden state. But here, the previous time step’s hidden state seems to be used. Can I get an explanation? Thanks.
It may be a bit confusing, but it is shifted. The reason is, before the first step, you already have one a priori hidden state, and first step looke like rnn(input[0], prior_hidden) ->hidden[0]. This way, tensor sizes match in time dimension.
These are the forward pass equations I’m most familiar with. At the very first time step, the initial hidden state a_prev (h[0] initialized to 0) is never used to make output predictions. Whereas in the above PyTorch tutorial, a_prev gets to make the first output prediction. Are these changes in architecture/implementation? sorry, I’m still sort of confused.