Say I calculate 4 losses for each batch as shown below:
losses_list = []
some_losses_fns = [loss_fn1,loss_fn2,loss_fn3,loss_fn4]
for loss_fn in some_losses_fns:
# assume loss is computed
loss = loss_fn(preds, true_labels)
losses_list.append(loss.item())
average_loss = torch.cat(losses_list).mean()
average_loss.requires_grad_()
Are the gradients of the losses going to be backpropagated properly if when I call .item() on them and then call requires_grad_() on the average loss?
I am trying to reduce the memory footprint of my model and I found this suggestion online but I wanted to double check whether it is legit. Any other suggestions? Maybe del the loss variable after it is appended to the losses_list?
In short, it won’t work using item. You need to keep tensors all along, else you’ll lose the computational graph (DAG) which is used by the autograd when calling backward (see autograd mechanism in the doc). item() method return a standard python number, which is indeed not compatible with backward.
You can safely use del on the loss after it is appended to the losses_list, but my guess is you won’t notice any significant change in memory footprint.
I don’t see any way to reduce the memory footprint on this part of your code.
Regards,
Thomas
[Edit] By the way, look to the documentation for the item method, it clearly states “This operation is not differentiable.”