To minimize gpu memory usage, how should I sum all the losses?

oasjd7 · August 28, 2020, 8:27am

for epoch in range(epochs):
      for step, data in enumerate(dataloader):
          ...
          total_loss = criterion(input, target) # 1st loss

          second_loss= criterion(input2, target2).item() # 2nd loss
          total_loss += second_loss.item()
          del second_loss

          third_loss = criterion(input3, target3).item() # 3rd loss
          total_Loss += third_loss.item()
          del third_loss
          ...
          optimizer.zero_grad()
          total_loss.backward()
          optimizer.step()

ptrblck · August 28, 2020, 8:48am

In your current approach you are adding second_loss and third_loss as Python floats to the total_loss (detached floating point constants), which won’t have any effect on the backward pass.
I’m not familiar with your use case, but if you want these losses to participate in the gradient calculation, you shouldn’t call .item() on them.

Are you running out of memory, if you are adding these losses directly to total_loss, which would be weird as I would assume they should use very little memory compared to the complete model and the intermediate tensors (but it also depends, which criterion you are using)?