Hi.I am having a CNN with two CE losses (due to two separate datasets to be trainined simultaneously), each with an individual FC layer prior to the last CE entropy loss estimation.
What should I do if I want to freeze one of these losses, i.e. training only either one without removing any of them?
-
I have set the requires_grad of one end to be False.
-
Should I set the corresponding layers to the evaluation mode? and what about the gradient calculated for the one to be freezed, simply set it to be zero?
Thank you.
1 Like
It depends on your current workflow.
I.e. are you using both linear layers, even though are are currently only using a single Dataset
?
If so, then setting requires_grad=False
for the linear layer, which should not be used, should be enough.
Calling eval()
on it shouldn’t be necessary, if you are only using a linear layer, but won’t hurt.
1 Like
Thank you for your response.
I am having a resnet appended with a linear FC layer for dataset 1, say FC-Dataset-1.
resnet = ...
in_fea = resnet.logits.in_features
resnet.logits = nn.Linear(in_features = in_fea, out_features = num_sub_1, bias = True)
In the meantime, there is another linear FC layer for dataset 2, say FC-Dataset-2
new_logits = nn.Linear(in_features = in_feas, out_features = num_sub_2, bias = True)
To train the new_logits only (while freezing FC-Dataset-1), I simply set the .requires_grad for FC-Dataset-1 to zero as follows.
for name, param in resnet.named_parameters():
if name in ['logits.weight', 'logits.bias']:
param.requires_grad = False
else:
param.requires_grad = True
And during training, I neglected the output for FC-Dataset-1 as follows.
# x, y_pred_1 = resnet(x, y)
x, _ = resnet(x, y)
y_pred_2 = new_logits(x)
# loss_1 = loss_fn_1(y_pred_1, y)
loss_1 = 0
loss_2 = loss_fn_2(y_pred_2, y)
loss = loss_1 + loss_2
# Involved only 1 optimizer combining parameters for both ends
optimizer.zero_grad()
loss.backward()
optimizer.step()
Is my implementation reasonable? thank you very much.
I assume you’ve written a custom resnet
model, as you are expecting two outputs?
Note that your current approach will not switch between the linear layers, but call new_logits
on top of resnet.logits
. Also, y_pred1
isn’t defined in your code.
If you haven’t defined a custom resnet but are using the torchvision
implementation, note that this model does not contain a resnet.logits
layer and you are simply assigning a new linear layer to this attribute.
Yes its a custom resnet model.
The main reason why I do not define y_pred_1
is that both dataset 1 and dataset 2 are having different subjects.
Since the inputs are images of dataset 2, defining y_pred_1
gives an Cuda error due to subject mismatched.
Note that your current approach will not switch between the linear layers,
What would you suggest if I am to switch between the two linear layers?
One approach would be to pass a flag to the forward
method and use it as a condition to switch between the linear heads in your model.
1 Like
Sorry I just revised the snippet earlier. The following is my actual implementation of which I trained only one end while freezing the another.
# x, y_pred_1 = resnet(x, y)
x, _ = resnet(x, y)
y_pred_2 = new_logits(x)
# loss_1 = loss_fn_1(y_pred_1, y)
loss_1 = 0
loss_2 = loss_fn_2(y_pred_2, y)
loss = loss_1 + loss_2`
Thanks for the update.
It looks generally alright, but you are still using this approach (assuming y
is a sample from Dataset2
):
x -> resnet -> output -> criterion -> loss1
y -> resnet -> output -> new_logits -> criterion -> loss2
Is that your workflow or would you rather want to switch internally between the last linear layer?
1 Like
Hmmmm actually I am pre-training the net using Dataset2
, and the entire net will be activated to train both losses with the two datasets simultaneously after this.
Does it mean you would remove new_logits
after the pretraining is done?
If that’s the case, then your model seems to be fine.
No, both losses (resnet.logits and new_logits) will be trained together, with two inputs from each dataset 1 and 2 .