Loss.backward with some intermediate layer frozen

manhon · June 8, 2021, 4:37pm

Hi,

I would like to ask if I could do this.

input → layer1,2,3 → layer4.5.6 (CNN and RNN) → predicted output.
then i have a loss function with L1 loss on predicted output and target output.

i would like to freeze layer4,5.6 weight, such that the loss will update the parameters in layer1,2,3 only.

I tried by setting:
layer4,5,6.eval()
then call forwards of layer4,5,6.

but got the error when call loss backward:
RuntimeError: cudnn RNN backward can only be called in training mode

many thanks for advice.

soulitzer · June 8, 2021, 5:02pm

.eval() does not freeze your layers. It puts your layers in “evaluation mode” (in contrast to training mode) which should only be done if you are testing/validating your model. To freeze layers while training, you should set requires_grad of layers 4,5,6 to false.

To freeze parts of your model, simply apply .requires_grad_(False) to the parameters that you don’t want updated. And as described above, since computations that use these parameters as inputs would not be recorded in the forward pass, they won’t have their .grad fields updated in the backward pass because they won’t be part of the backward graph in the first place, as desired.

Because this is such a common pattern, requires_grad can also be set at the module level with nn.Module.requires_grad_(). When applied to a module, .requires_grad_() takes effect on all of the module’s parameters (which have requires_grad=True by default).

See Autograd mechanics — PyTorch 1.9.0 documentation