Backpropagation through the training procedure

dbp.pat94 · December 15, 2020, 10:33am

I went through Implementing Truncated Backpropagation Through Time as you suggested but I think I am a little confused about this

this is not retro-active: you need to detach() (or detach_() which is just a shorthand for x = x.detach() ) before using the Tensor, otherwise it won’t have any effect.

So, now I will put my question a little differently and see if it makes sense:

How do I detach only certain indices/slices of the original Tensor. The reason why I am asking this is because I, at a given training iteration, only want to track gradient updates corresponding to the last k1 batches of inputs (x’s). So, I am interested in keeping requires_grad=True only for those inputs (x) and requires_grad=False for the rest of the inputs.