CNN with Two Losses

wwaayyaaww · April 19, 2020, 1:15pm

Hi.I am having a CNN with two CE losses (due to two separate datasets to be trainined simultaneously), each with an individual FC layer prior to the last CE entropy loss estimation.

What should I do if I want to freeze one of these losses, i.e. training only either one without removing any of them?

I have set the requires_grad of one end to be False.
Should I set the corresponding layers to the evaluation mode? and what about the gradient calculated for the one to be freezed, simply set it to be zero?

Thank you.

ptrblck · April 20, 2020, 1:30am

It depends on your current workflow.
I.e. are you using both linear layers, even though are are currently only using a single Dataset?
If so, then setting requires_grad=False for the linear layer, which should not be used, should be enough.
Calling eval() on it shouldn’t be necessary, if you are only using a linear layer, but won’t hurt.

wwaayyaaww · April 20, 2020, 5:33am

Thank you for your response.

I am having a resnet appended with a linear FC layer for dataset 1, say FC-Dataset-1.

resnet = ...
in_fea = resnet.logits.in_features
resnet.logits = nn.Linear(in_features = in_fea, out_features = num_sub_1, bias = True)

In the meantime, there is another linear FC layer for dataset 2, say FC-Dataset-2

new_logits = nn.Linear(in_features = in_feas, out_features = num_sub_2, bias = True)

To train the new_logits only (while freezing FC-Dataset-1), I simply set the .requires_grad for FC-Dataset-1 to zero as follows.

for name, param in resnet.named_parameters():
        if name in ['logits.weight', 'logits.bias']:
            param.requires_grad = False
        else:
            param.requires_grad = True

And during training, I neglected the output for FC-Dataset-1 as follows.

# x, y_pred_1 = resnet(x, y)
x, _ = resnet(x, y) 
y_pred_2 = new_logits(x)

# loss_1 = loss_fn_1(y_pred_1, y)
loss_1 = 0
loss_2 = loss_fn_2(y_pred_2, y)
loss = loss_1 + loss_2

#  Involved only 1 optimizer combining parameters for both ends        
optimizer.zero_grad()  
loss.backward()
optimizer.step()

Is my implementation reasonable? thank you very much.

ptrblck · April 20, 2020, 6:24am

I assume you’ve written a custom resnet model, as you are expecting two outputs?

Note that your current approach will not switch between the linear layers, but call new_logits on top of resnet.logits. Also, y_pred1 isn’t defined in your code.

If you haven’t defined a custom resnet but are using the torchvision implementation, note that this model does not contain a resnet.logits layer and you are simply assigning a new linear layer to this attribute.

wwaayyaaww · April 20, 2020, 6:37am

Yes its a custom resnet model.

The main reason why I do not define y_pred_1 is that both dataset 1 and dataset 2 are having different subjects.

Since the inputs are images of dataset 2, defining y_pred_1 gives an Cuda error due to subject mismatched.

Note that your current approach will not switch between the linear layers,

What would you suggest if I am to switch between the two linear layers?

ptrblck · April 20, 2020, 6:41am

One approach would be to pass a flag to the forward method and use it as a condition to switch between the linear heads in your model.

wwaayyaaww · April 20, 2020, 6:52am

Sorry I just revised the snippet earlier. The following is my actual implementation of which I trained only one end while freezing the another.

# x, y_pred_1 = resnet(x, y)
x, _ = resnet(x, y) 
y_pred_2 = new_logits(x)

# loss_1 = loss_fn_1(y_pred_1, y)
loss_1 = 0
loss_2 = loss_fn_2(y_pred_2, y)
loss = loss_1 + loss_2`

ptrblck · April 20, 2020, 7:09am

Thanks for the update.
It looks generally alright, but you are still using this approach (assuming y is a sample from Dataset2):

x -> resnet -> output -> criterion -> loss1
y -> resnet -> output -> new_logits -> criterion -> loss2

Is that your workflow or would you rather want to switch internally between the last linear layer?

wwaayyaaww · April 20, 2020, 7:20am

Hmmmm actually I am pre-training the net using Dataset2 , and the entire net will be activated to train both losses with the two datasets simultaneously after this.

ptrblck · April 20, 2020, 7:32am

Does it mean you would remove new_logits after the pretraining is done?
If that’s the case, then your model seems to be fine.

wwaayyaaww · April 20, 2020, 7:33am

No, both losses (resnet.logits and new_logits) will be trained together, with two inputs from each dataset 1 and 2 .