Problem with zeroing gradients

Hi, I have a problem which is bothering me for a week, any suggestion would be appreciated.

The problem is, when I tried to zeroing out part of the gradients, the corresponding weights are not frozen as expected, these weights are moving in a very small order similar to a truncation error.

In my application, I need to zero out part of the gradients under specific conditions. Here’s an example code to reproduce the error.

The code is modified from the following pytorch example.

The 3 ways I use to zero out gradients, the error is still there.

model.conv1.weight.grad *= 0
model.conv1.weight.grad.fill_(0)
model.conv1.weight.grad.data.zero_()

The error is small so it does not affect the performance much, but I’d like to understand how does this happen. Thanks a lot.

If you use momentum or weight decay (from the optimizer), you will get changing parameters even when the grad has the numerical value 0.
Optimizers do special case None gradients to mean “do nothing”, but that isn’t available for parts of a parameter, but in your example model.conv1.weight.grad = None would do the trick.

Best regards

Thomas

Thanks Thomas.

Now I understand what is happening.