Say I calculate 4 losses for each batch as shown below:
losses_list = 
some_losses_fns = [loss_fn1,loss_fn2,loss_fn3,loss_fn4]
for loss_fn in some_losses_fns:
# assume loss is computed
loss = loss_fn(preds, true_labels)
average_loss = torch.cat(losses_list).mean()
Are the gradients of the losses going to be backpropagated properly if when I call
.item() on them and then call
requires_grad_() on the average loss?
I am trying to reduce the memory footprint of my model and I found this suggestion online but I wanted to double check whether it is legit. Any other suggestions? Maybe
loss variable after it is appended to the
Thanks in advance!
In short, it won’t work using
item. You need to keep tensors all along, else you’ll lose the computational graph (DAG) which is used by the autograd when calling backward (see autograd mechanism in the doc).
item() method return a standard python number, which is indeed not compatible with backward.
You can safely use
del on the
loss after it is appended to the
losses_list, but my guess is you won’t notice any significant change in memory footprint.
I don’t see any way to reduce the memory footprint on this part of your code.
[Edit] By the way, look to the documentation for the item method, it clearly states “This operation is not differentiable.”
Thanks for your answer. Yeap, my guess was that the
item() method is going to destroy the DAG. I got this from this article Memory Management, Optimisation and Debugging with PyTorch
I found my way around this problem by utilising GPU nodes with more memory, so that my pods are not evicted, but, of course for a higher price