Hi, I have a questions about DistributedDataparallel
I Implement codes like below in Single GPU(It combined two different losses)
def train(model, criterion, ...): model.train() outputs_a = model(inputs_a) loss_a = criterion(outputs_a, targets) ... outputs_b = model(inputs_b) loss_b = criterion(outputs_b, targets) ... total_loss = (lam * loss_a) + ((1 - lam) * loss_b) total_loss.backward() ...
It runs well without any problems
After, when i try to using multiple-gpu with DistributedDataparallel it throws an error like this,
one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor ] is at version 5; expected version 3 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.detect_anomaly(True).
I found this error caused by inplace operation,
But i can’t found any inplace operation in my code!!
So I changed my code like below
def train(model, criterion, ...): model.train() outputs_a = model(inputs_a) loss_a = criterion(outputs_a, targets) loss_a.backward() ... outputs_b = model(inputs_b) loss_b = criterion(outputs_b, targets) loss_b.backward() ... total_loss = loss_a + loss_b ...
Fortunately, it runs well without any error
However, i have a question
#1 total_loss = (loss_a + loss_b) total_loss.backward() #2 loss_a.backward() loss_b.backward() total_loss = loss_a + loss_b
Are these two operations the same?