I am trying to have different layers of the network trained during specific epochs. So I set require_grad for those layers to True and then False. I noticed that :
self.model.conv2[0].requires_grad is True
self.model.conv2[0].weight.requires_grad is False
although I set self.model.conv2[0].requires_grad = False
where self.model.conv2[0] is Conv2d(12, 25, kernel_size=(3, 3), stride=(1, 1))
shouldn’t require_grad propagate to the parameters of the layers (weights and biases)?
nn.Modules don’t have a requires_grad field. So creating one and setting it to True won’t change anything.
You can use zero_grad() on it to set all gradients to zero. Or use parameters() to get an iterator over the parameters of your module so that you can set requires_grad to True for each of them.
Sorry to revive an old topic. But I have kind of the same question:
Is there an easy way to set requires_grad simultaneously to all the parameters associated with a module (recursively if said module contains other modules)?
You could write a method, which accepts a module, checks for valid parameters (weight and bias) and manipulates the requires_grad attribute, similar to a weight_init method:
def set_requires_grad(m, requires_grad):
if hasattr(m, 'weight') and m.weight is not None:
m.weight.requires_grad_(requires_grad)
if hasattr(m, 'bias') and m.bias is not None:
m.bias.requires_grad_(requires_grad)
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.module1 = nn.Sequential(
nn.Linear(1, 1),
nn.ReLU(),
nn.Linear(1,1, bias=False)
)
self.module2 = nn.Sequential(
nn.Linear(1, 1),
nn.ReLU(),
nn.Linear(1,1, bias=False)
)
model = MyModel()
model.module1.apply(lambda m: set_requires_grad(m, False))
print(model.module1[0].weight.requires_grad)
> False
print(model.module1[0].bias.requires_grad)
> False
print(model.module2[2].weight.requires_grad)
> True
The problem I have with using m.parameters() or m.named_parameters() is that it does not return the affine parameters of the batch norm layers if I am not mistaken.
Could you explain how to set our batch norm layers to requires_grad=False?
Thanks!