For training resnet, I want to freeze some convolution blocks in the first conv layer. So for:
self.conv1 = nn.Conv2d(3,64, kernel_size=3, stride=1, padding=1, bias=False)
- This layer has about 64 filters in total and I want to freeze say half of the filters (say first 32) and let the other half train. How can I do this?
You can register a backward hook to the weight that zeros out the gradient corresponding to the parts of the weights that you don’t want to train.
https://pytorch.org/docs/stable/generated/torch.Tensor.register_hook.html?highlight=register_hook#torch.Tensor.register_hook
Hi Just (and Soulitzer)!
I would caution against zeroing out gradients (whether with a hook or by some
other method) as a general technique for freezing weights. The reason is that
many optimizers (for example, SGD
with weight decay) will modify parameters
even if their gradients are zero.
(If you are using plain-vanilla SGD
, with no momentum and no weight decay,
zeroing the gradients will freeze the corresponding parameters.)
For the specific use case you describe, you could either:
-
Split
conv1
into twoConv2d (3, 32)
's (andcat
their outputs back
together to pass into a subsequentConv2d (64, out channels)
)
and leave the secondConv2d (3, 32)
out of the optimizer. -
Use a single
Conv2d (3, 64)
, save copies of the parameter values
you want frozen, calloptimizer.step()
, and copy back the frozen
parameter values.
Best.
K. Frank
Good point, however, if you zeroed out your weights wrt those parameters from the very beginning, I wouldn’t expect any momentum to build up because the gradients have always been zero.