Freeze few blocks in convolution layer

For training resnet, I want to freeze some convolution blocks in the first conv layer. So for:
self.conv1 = nn.Conv2d(3,64, kernel_size=3, stride=1, padding=1, bias=False) - This layer has about 64 filters in total and I want to freeze say half of the filters (say first 32) and let the other half train. How can I do this?

You can register a backward hook to the weight that zeros out the gradient corresponding to the parts of the weights that you don’t want to train.

Hi Just (and Soulitzer)!

I would caution against zeroing out gradients (whether with a hook or by some
other method) as a general technique for freezing weights. The reason is that
many optimizers (for example, SGD with weight decay) will modify parameters
even if their gradients are zero.

(If you are using plain-vanilla SGD, with no momentum and no weight decay,
zeroing the gradients will freeze the corresponding parameters.)

For the specific use case you describe, you could either:

  1. Split conv1 into two Conv2d (3, 32)'s (and cat their outputs back
    together to pass into a subsequent Conv2d (64, out channels))
    and leave the second Conv2d (3, 32) out of the optimizer.

  2. Use a single Conv2d (3, 64), save copies of the parameter values
    you want frozen, call optimizer.step(), and copy back the frozen
    parameter values.


K. Frank

Good point, however, if you zeroed out your weights wrt those parameters from the very beginning, I wouldn’t expect any momentum to build up because the gradients have always been zero.