How to freeze or fix the specific(subset, partial) weight in convolution filter

Above is my code for trying to fix, freeze the subset(temp_5.values ) of 25th convolution layer’s filter. And the result.
I try to set the subset filter’s gradient as 0 so not to update. And the weight has been changed.

So two questions

  1. I wonder why the gradient becomes not zero?

  2. Is there other solution to fix the subset convolution filter?

Which optimizer are you using?
If you are using an optimizer with running estimates, note that even with a zero gradient, the parameters might still be updated, if the running estimates were previously calculated.
E.g. optim.Adam would show such a behavior.

PS: It’s better to post the code directly by wrapping it into three backticks ``` as it makes debugging easier. :wink:

1 Like

Thanks for the reply this is my code. I use the SGD optimizer

optimizer = optim.SGD(im_net.parameters(), lr=0.1, momentum=0.9,
weight_reference = im_net.features[25].weight[temp_4].clone()
for batch_idx, (inputs, targets) in enumerate(testloader):

    inputs = inputs.float().cuda()
    inputs, targets =,
    outputs = im_net(inputs)
    ce_loss = criterion(outputs, targets)
    loss = criterion(outputs, targets)
    #im_net.features[25].weight[temp_4].register_hook(lambda grad : grad * 0)
    im_net.features[25].weight.grad[temp_4] = 0.
    print(weight_reference == im_net.features[25].weight[temp_4])


By the way, could you explain about ‘optimizer with running estimates’? I don’t know what it means.

momentum would be such an internal state, which uses the previous gradients to update the parameters even with a zero gradients.
Here is a small example showing this behavior:

# Setup
model = nn.Conv2d(1, 10, 3, 1, 1)
weight_reference = model.weight.clone()
optimizer = torch.optim.SGD(model.parameters(), lr=1., momentum=0.0)

# Should fail
model(torch.randn(1, 1, 3, 3)).mean().backward()
print((model.weight == weight_reference).all())

# Should work with momentum = 0.0
weight_reference = model.weight.clone()

model.weight.register_hook(lambda grad: grad * 0)
model(torch.randn(1, 1, 3, 3)).mean().backward()

print((model.weight == weight_reference).all())

The parameters should not be updated using mementum=0.0, while any other value for momentum will still update the parameters.

1 Like

Thanks for the reply again :grinning:
I change the momentum to zero but it still changes the gradient and the weight still updates.
So I made the weight decay to zero and it works! I wonder why weight decay makes this difference.
P.S If i have to set weight decay to zero, then is there other solution to give regularization like the weight decay do?

I think you would have to define the weight decay manually, e.g. as shown here, and filter out the parameters, which should not be added to weight decay.

1 Like

Hi @ptrblck,

Regarding the fact that parameters are still updated even when the grad is zero (momentum, decay, etc), will setting the grads to none solve this issue? ie, optimizer.zero_grad(set_to_none=True)?


Yes, this should be the case, since the optimizers should check if the gradients are even set (e.g. here for Adam). In any case, I would always double check the behavior especially if you are using custom optimizers, as they might use another workflow.

1 Like

Okay, this is the behaviour I was looking for. Thank you for your time!