I need some advice.
So I am currently using retain graph = True in my code, simply because I need all the gradients of previous tensors in my code.
The reason for this is, that my variables come from a recursion, where one variable depends on another which in turn depends on another, all the way till the first variable.
(I hope I was able to explain this).
I want to get rid of the retain graph = True as it takes a lot of time in computation.
Is there a way I could combine all gradients and pass it on to the next iteration or something like this which will enable me to get rid of retain Graph = True
I’m not sure I understand the use case completely, so please correct me if I’m wrong.
retain_graph is used to keep the computation graph in case you would like to call
backward using this graph again.
A typical use case would be multiple losses, where the second
backward call still needs the intermediate tensors to compute the gradients.
Each backward call computes the gradients for the parameters. If you don’t zero out the gradients, each following
backward will accumulate the gradients in all parameters.
So I’m not sure, if you really need to retain the graph or if you just want to accumulate the grads.
I have the same problem. How did you fix it? I’m trying to backprop gradients through NLMS filter which parameters are recursively updated.
Thanks for your reply. I believe we have the same issue, the code can be written as the follow:
for batch in self.trainset:
for i in range(seq):
out1, out2 = net(input1[i], input2[i])
out3, W = adapt_filter(out1, out2, W)
loss = mse_loss(out3, traget)
if seq is large like 1000, huge amount of computaion graphs will be saved which can cause CUDA out of memory and slow down the training process. Do you have any suggestions to avoid it? Hope I make it clear. Thanks again.
The increasing memory usage is expected, since you are not deleting the computation graph via
retain_graph=True. Could you explain your use case a bit more and why you are setting this argument?