How to combine different two losses?

Gwon · February 18, 2021, 8:21am

Hi, I have a questions about DistributedDataparallel

I Implement codes like below in Single GPU(It combined two different losses)

def train(model, criterion, ...):
   model.train()
   outputs_a = model(inputs_a)
   loss_a = criterion(outputs_a, targets) 
   ...
   outputs_b = model(inputs_b)
   loss_b = criterion(outputs_b, targets)
   ...
   total_loss = (lam * loss_a) + ((1 - lam) * loss_b)
   total_loss.backward()
   ...

It runs well without any problems

After, when i try to using multiple-gpu with DistributedDataparallel it throws an error like this,

one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512]] is at version 5; expected version 3 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.detect_anomaly(True).

I found this error caused by inplace operation,
But i can’t found any inplace operation in my code!!

So I changed my code like below

def train(model, criterion, ...):
   model.train()
   outputs_a = model(inputs_a)
   loss_a = criterion(outputs_a, targets) 
   loss_a.backward()
   ...
   outputs_b = model(inputs_b)
   loss_b = criterion(outputs_b, targets)
   loss_b.backward()
   ...
   total_loss = loss_a + loss_b
   ...

Fortunately, it runs well without any error

However, i have a question

#1
total_loss = (loss_a + loss_b)
total_loss.backward()

#2
loss_a.backward()
loss_b.backward()
total_loss = loss_a + loss_b

Are these two operations the same?