Correct way to train without teacher forcing

Gautam_Bhattacharya · March 26, 2018, 5:32pm

Hello,

I am trying to train a RNN model without teacher forcing, where the input in an encoder image. The input to the RNN is a FC-layer.

#initialize hidden
#run rnn one step at a time
for step in range(nsteps):
output, hidden = RNN(input,hidden)
#project output to same size as input
proj = projection_layer(output)
input = Variable(proj.data, requires_grad=True)

Q. Is this correct? do i need to wrap ‘proj.data’ like above. If I just do input = proj, I am guessing the gradients will accumulate which is not desirable.

Thanks

jpeg729 · March 27, 2018, 9:32am

If you do input = proj then backpropagation will flow back through time via the hidden state AND through the previous step’s output. This might be a good thing, but if you don’t want it, then you need one of the following fixes.

input = Variable(proj.data) # no need for requires_grad
input = proj.detach()

Gautam_Bhattacharya · March 27, 2018, 3:36pm

Hi @jpeg729 Thanks for the reply.
How would you do it? i.e. allow the gradient to flow through o/p of previous step or not?
It would seem like maybe you want to do it when training something like a language model.

Thanks

jpeg729 · March 27, 2018, 3:38pm

I have never seen, nor heard of it being done, but I would be intrigued to see if it did improve the performance of any sort of autoregressive model (where the previous output is used as input).

Gautam_Bhattacharya · March 27, 2018, 4:31pm

@jpeg729 Hmm. I am trying to do something like scheduled sampling https://arxiv.org/pdf/1506.03099.pdf, but for video prediction.
I am guessing they allow gradient to flow through o/p of previous step

jpeg729 · March 28, 2018, 7:32am

I don’t think they do. They seem to treat the previous output as the current input, and as a general rule you can’t backpropagate your inputs. So unless they say very clearly that they do backpropagate through the previous outputs, then they don’t.

That said, if you are still in doubt, why don’t you email the authors and ask them? If you do, try to pose your question as clearly as possible because if they don’t clearly understand the question then they might not find the time to reply.

Gautam_Bhattacharya · March 28, 2018, 4:27pm

@jpeg729 Thanks for all your help!