But alpha is a tuning parameter. So, when I set alpha=1, I want to only use the first loss function loss1. But when I do that, I find all parameters related to loss2 get updated such as self.fc2 & self.bn2. Same when I set alpha=0, I find all parameters related to loss1 get updated such as self.fc1 & self.bn1.
What is wrong? Any help is much appreciated
In case needed, the following is the forward function of the model where self.bert is frozen:
def forward(self, x):
x = self.bert(x).hidden_states[-1]
x = x.view(x.shape[0], -1)
out1 = self.fc1(self.dropout(self.bn1(x)))
out2 = self.fc2(self.dropout(self.bn2(x)))
return out1, out2
Could you post a minimal, executable code snippet showing the issue?
If you are checking for parameter updates directly, note that optimizers with internal states (e.g. Adam) might update parameters with a zero gradient if these parameters were previously updated and thus have a running internal state.
I think the last paragraph in your reply explains my problem. I was re-running the for-loop with different alpha values. Once I redefined the whole model with the new alpha, everything worked as expected!