Correct way to train without teacher forcing


I am trying to train a RNN model without teacher forcing, where the input in an encoder image. The input to the RNN is a FC-layer.

#initialize hidden
#run rnn one step at a time
for step in range(nsteps):
output, hidden = RNN(input,hidden)
#project output to same size as input
proj = projection_layer(output)
input = Variable(, requires_grad=True)

Q. Is this correct? do i need to wrap ‘’ like above. If I just do input = proj, I am guessing the gradients will accumulate which is not desirable.


1 Like

If you do input = proj then backpropagation will flow back through time via the hidden state AND through the previous step’s output. This might be a good thing, but if you don’t want it, then you need one of the following fixes.

  • input = Variable( # no need for requires_grad
  • input = proj.detach()

Hi @jpeg729 Thanks for the reply.
How would you do it? i.e. allow the gradient to flow through o/p of previous step or not?
It would seem like maybe you want to do it when training something like a language model.


I have never seen, nor heard of it being done, but I would be intrigued to see if it did improve the performance of any sort of autoregressive model (where the previous output is used as input).

@jpeg729 Hmm. I am trying to do something like scheduled sampling, but for video prediction.
I am guessing they allow gradient to flow through o/p of previous step

I don’t think they do. They seem to treat the previous output as the current input, and as a general rule you can’t backpropagate your inputs. So unless they say very clearly that they do backpropagate through the previous outputs, then they don’t.

That said, if you are still in doubt, why don’t you email the authors and ask them? If you do, try to pose your question as clearly as possible because if they don’t clearly understand the question then they might not find the time to reply.

@jpeg729 Thanks for all your help!

1 Like