Hi, I have a questions about DistributedDataparallel
I Implement codes like below in Single GPU(It combined two different losses)
def train(model, criterion, ...):
model.train()
outputs_a = model(inputs_a)
loss_a = criterion(outputs_a, targets)
...
outputs_b = model(inputs_b)
loss_b = criterion(outputs_b, targets)
...
total_loss = (lam * loss_a) + ((1 - lam) * loss_b)
total_loss.backward()
...
It runs well without any problems
After, when i try to using multiple-gpu with DistributedDataparallel it throws an error like this,
one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512]] is at version 5; expected version 3 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.detect_anomaly(True).
I found this error caused by inplace operation,
But i can’t found any inplace operation in my code!!
So I changed my code like below
def train(model, criterion, ...):
model.train()
outputs_a = model(inputs_a)
loss_a = criterion(outputs_a, targets)
loss_a.backward()
...
outputs_b = model(inputs_b)
loss_b = criterion(outputs_b, targets)
loss_b.backward()
...
total_loss = loss_a + loss_b
...
Fortunately, it runs well without any error
However, i have a question
#1
total_loss = (loss_a + loss_b)
total_loss.backward()
#2
loss_a.backward()
loss_b.backward()
total_loss = loss_a + loss_b
Are these two operations the same?