Loss.backward() updates loss function that wasn't included

Anwarvic · April 11, 2022, 9:54pm

Right now, I’m trying to train a model using a weighted sum of two loss functions (loss1 and loss2) of the same type; as shown below:

alpha = 0.5
for input, labels in data_loader_train:
    out1, out2 = model(input)
    loss1 = criterion(out1, labels[:, 0])
    loss2 = criterion(out2, labels[:, 1])
    loss = alpha * loss1 + (1-alpha) * loss2
    punc_loss.backward()
    optimizer.step()
    optimizer.zero_grad()

So far, everything works as expected.

But alpha is a tuning parameter. So, when I set alpha=1, I want to only use the first loss function loss1. But when I do that, I find all parameters related to loss2 get updated such as self.fc2 & self.bn2. Same when I set alpha=0, I find all parameters related to loss1 get updated such as self.fc1 & self.bn1.

What is wrong? Any help is much appreciated

In case needed, the following is the forward function of the model where self.bert is frozen:

def forward(self, x):
    x = self.bert(x).hidden_states[-1]
    x = x.view(x.shape[0], -1)
    out1 = self.fc1(self.dropout(self.bn1(x)))
    out2 = self.fc2(self.dropout(self.bn2(x)))
    return out1, out2

ptrblck · April 11, 2022, 10:46pm

I cannot reproduce the issue and the loss weighting works as expected:

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 10)
        self.fc2 = nn.Linear(10, 10)

    def forward(self, x):
        out1 = self.fc1(x)
        out2 = self.fc2(x)
        return out1, out2
    
model = MyModel()
x = torch.randn(1, 10)
criterion = nn.MSELoss()
target = torch.randn(1, 10)

# both losses
alpha = 0.5
out1, out2 = model(x)
loss1 = criterion(out1, target)
loss2 = criterion(out2, target)
loss = alpha * loss1 + (1-alpha) * loss2
loss.backward()

for name, param in model.named_parameters():
    print(name, param.grad.abs().sum() if param.grad != None else None)
model.zero_grad()
# fc1.weight tensor(7.6900)
# fc1.bias tensor(1.3559)
# fc2.weight tensor(7.6103)
# fc2.bias tensor(1.3418)

# loss1
alpha = 1.0
out1, out2 = model(x)
loss1 = criterion(out1, target)
loss2 = criterion(out2, target)
loss = alpha * loss1 + (1-alpha) * loss2
loss.backward()

for name, param in model.named_parameters():
    print(name, param.grad.abs().sum() if param.grad != None else None)
model.zero_grad()
# fc1.weight tensor(15.3799)
# fc1.bias tensor(2.7117)
# fc2.weight tensor(0.)
# fc2.bias tensor(0.)

# loss2
alpha = 0.0
out1, out2 = model(x)
loss1 = criterion(out1, target)
loss2 = criterion(out2, target)
loss = alpha * loss1 + (1-alpha) * loss2
loss.backward()

for name, param in model.named_parameters():
    print(name, param.grad.abs().sum() if param.grad != None else None)
model.zero_grad()
# fc1.weight tensor(0.)
# fc1.bias tensor(0.)
# fc2.weight tensor(15.2206)
# fc2.bias tensor(2.6836)

Could you post a minimal, executable code snippet showing the issue?
If you are checking for parameter updates directly, note that optimizers with internal states (e.g. Adam) might update parameters with a zero gradient if these parameters were previously updated and thus have a running internal state.

Anwarvic · April 12, 2022, 9:10am

What an amazing answer from an amazing person!

I think the last paragraph in your reply explains my problem. I was re-running the for-loop with different alpha values. Once I redefined the whole model with the new alpha, everything worked as expected!