It does look like what you want is very similar to truncated backpropagation through time. Only that in your case, the evaluations are not timestep but optimization steps.
You can check this post that gives an example of how to do it manually: Implementing Truncated Backpropagation Through Time - #4 by albanD
data_loader.tensors[0][idx_last_k1, :].detach_()
I am not sure what you’re trying to achieve here, but it most likely doesn’t work The indexing of [idx_last_k1, :] is returning a temporary view on the original tensor. And then you detach that temporary view inplace. But that does not modify the original Tensor!
whether or not I have used
retain_graph=True
correctly
You should only ever use it if you compute gradients on the same graph multiple times. Otherwise, there is something wrong.