How to freeze or fix the specific(subset, partial) weight in convolution filter

Above is my code for trying to fix, freeze the subset(temp_5.values ) of 25th convolution layer’s filter. And the result.
I try to set the subset filter’s gradient as 0 so not to update. And the weight has been changed.

So two questions

  1. I wonder why the gradient becomes not zero?

  2. Is there other solution to fix the subset convolution filter?

Which optimizer are you using?
If you are using an optimizer with running estimates, note that even with a zero gradient, the parameters might still be updated, if the running estimates were previously calculated.
E.g. optim.Adam would show such a behavior.

PS: It’s better to post the code directly by wrapping it into three backticks ``` as it makes debugging easier. :wink:

1 Like

Thanks for the reply this is my code. I use the SGD optimizer

optimizer = optim.SGD(im_net.parameters(), lr=0.1, momentum=0.9,
weight_reference = im_net.features[25].weight[temp_4].clone()
for batch_idx, (inputs, targets) in enumerate(testloader):

    inputs = inputs.float().cuda()
    inputs, targets =,
    outputs = im_net(inputs)
    ce_loss = criterion(outputs, targets)
    loss = criterion(outputs, targets)
    #im_net.features[25].weight[temp_4].register_hook(lambda grad : grad * 0)
    im_net.features[25].weight.grad[temp_4] = 0.
    print(weight_reference == im_net.features[25].weight[temp_4])


By the way, could you explain about ‘optimizer with running estimates’? I don’t know what it means.

momentum would be such an internal state, which uses the previous gradients to update the parameters even with a zero gradients.
Here is a small example showing this behavior:

# Setup
model = nn.Conv2d(1, 10, 3, 1, 1)
weight_reference = model.weight.clone()
optimizer = torch.optim.SGD(model.parameters(), lr=1., momentum=0.0)

# Should fail
model(torch.randn(1, 1, 3, 3)).mean().backward()
print((model.weight == weight_reference).all())

# Should work with momentum = 0.0
weight_reference = model.weight.clone()

model.weight.register_hook(lambda grad: grad * 0)
model(torch.randn(1, 1, 3, 3)).mean().backward()

print((model.weight == weight_reference).all())

The parameters should not be updated using mementum=0.0, while any other value for momentum will still update the parameters.

1 Like

Thanks for the reply again :grinning:
I change the momentum to zero but it still changes the gradient and the weight still updates.
So I made the weight decay to zero and it works! I wonder why weight decay makes this difference.
P.S If i have to set weight decay to zero, then is there other solution to give regularization like the weight decay do?

I think you would have to define the weight decay manually, e.g. as shown here, and filter out the parameters, which should not be added to weight decay.

1 Like

Hi @ptrblck,

Regarding the fact that parameters are still updated even when the grad is zero (momentum, decay, etc), will setting the grads to none solve this issue? ie, optimizer.zero_grad(set_to_none=True)?


Yes, this should be the case, since the optimizers should check if the gradients are even set (e.g. here for Adam). In any case, I would always double check the behavior especially if you are using custom optimizers, as they might use another workflow.

1 Like

Okay, this is the behaviour I was looking for. Thank you for your time!

Hi, I would like to ask whether this method works for only freezing a part of the parameter.
Since the code of Adam here is probably checking whether the whole parameter is None.

for p in group['params']:
    if p.grad is not None:

Then if I only want to freeze the subset of the parameter as @hongjunchoi does, does optimizer.zero_grad(set_to_none=True) works when setting gradient of the subset to zero?


I’m unsure what “subset” refers to, but note that you cannot set some gradients to None and others to valid values for the same parameter. Each parameter has a .grad attribute which is either set to None or a tensor.

Thanks for the reply!

My definition of the “subset” is same as @hongjunchoi .
My goal is to freeze the subset of the parameter, and the subset is given by a list of index id=[0,1,2,...]. Therefore, I set the subset’s gradient to zero, and the code is like:


This method doesn’t work if the optimizer has momentum part.
However, I’m not sure whether optimizer.zero_grad(set_to_none=True) would prevent Adam from updating the subset since param.grad is not None in this case.

optimizer.zero_grad(set_to_none=True) is irrelevant, since you are setting the gradient to zero after a valid gradient was already calculated.
The optimizer.zero_grad() call could have set the .grad to None before, but since a backward() pass was executed afterwards, the .grad attribute is already populated.
As described in my previous post, it’s not possible to set an attribute to None and a valid value.

I see. Is it impossible to freeze the subset of the parameter if I use Adam optimizer? Or there are others ways to do so instead of setting subset’s gradient to zero.


I would recommend to explicitly restore the “frozen” values of the parameter as described in this post.

1 Like