Questions about auxillary classifier of inceptionV3

Hi,

I’m new to transfer learning and I got two questions about inceptionV3.

  1. I’m following the pytorch transfer learning tutorial and I’m gonna do ‘ConvNet as fixed feature extractor’ (https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html). So I should freeze all the layers except the FINAL fc layer. But what about the fc layer in auxillary classifier? Am I supposed to unfreeze this fc layer?

  2. According to my understanding, if we don’t set the model to evaluation mode (model.eval()), there will be two outputs: one from the auxillary fc layer and the other from the final fc layer. So we have to set the model to eval mode when testing the model. Is this correct?

Thanks a lot!!

You can disable the auxiliary output by setting aux_logits=False.
I would start the fine tuning by disabling it and just re-train the final classifier.

Hi,
Thanks for your reply. But why wouldn’t you retrain the auxiliary classifier and the final classifier together?

The aux_logits are created a bit deeper in the model (line of code), so that it would just make sense to use them, if your fine tuning goes further down the model (below self.Mixed_6e(x)). Maybe I misunderstood your first post, but I thought you just wanted to re-create the last linear layer.

If you want to fine tune the whole model of just beyond the aux output, you could of course re-use the aux_logits. I’m not sure if that’s the usual approach, but definitely worth a try!
In that case you would also have to re-create the final linear layer in InceptionAux.

I’m sorry maybe i did not explain it clearly. What I wanted to do is to re-create both the last fc layer and the fc layer within the auxiliary classifier, then just re-train the two layers. Therefore, for each epoch during training, we’ll have two outputs (one for auxiliary and one for the final fc layer) and two losses, loss_1 and loss_2, then do backprop by (loss_1 + loss_2).backward() .
Do you think it’ll work…?

The approach would work (summing the losses and backpropagate through the sum), but it’s probably not necessary, if you don’t want to finetune below the auxiliary classifier.
Assuming that all layers are frozen (do not require gradients) except the last linear layer, the auxiliary loss would just vanish as it’s not needed.
In the original paper the aux loss was used to “enhance” the gradients at this point. I.e. the loss from the model output would be used to calculate the gradients for all layers. Since the model is quite deep, the gradients tend to vanish in the first layers. Therefore an auxiliary was used to create another gradient signal and to sum it to the “output signal” to have valid gradients up to the first layer.
In your case, if the layers are frozen, the aux loss won’t do anything besides being calculated.

Correct me, if I misunderstood your use case.

1 Like

Thanks a lot! Problem solved!