Hi! I am currently facing a problem where I do something like this:
for batch_of_inputs in dataset:
batch_of_inputs .requires_grad_()
batch_of_outputs=model(preprocess(batch_of_inputs ))
# compute the gradient of the output with respect to the inputs
gradients=torch.autograd.grad(outputs=batch_of_outputs,
inputs=batch_of_outputs,
grad_outputs=torch.ones_like(batch_of_outputs),
retain_graph=True)[0]
save_to_disk(gradients)
And the thing is that GPU memory keeps increasing as the batches are processed until no more memory is available and the process crashes.
As far as I understand, the problem is that all the gradients are being tracked and stored and the memory is not cleaned after each iteration. How should I delete de gradients so that they don’t use GPU memory once they are saved to disk?
Thanks in advance