A weird zero_grad problem

I set requires_grad=False for some parameters, and their grad is zero. However, when I run optim.step(), those parameters still update their values.

This is my code

for p in self.G.enc_layers.parameters():
    requires_grad_flag = p.requires_grad
    p.requires_grad = False
z_rec = self.G(img_fake, mode='enc')
gzr_loss = F.mse_loss(z_rec[-1], zs_a[-1])

import pdb
pdb.set_trace()
# the problem appears in the following three lines
self.optim_G.zero_grad()
(self.lambda_gzr * gzr_loss).backward(retain_graph=True)
self.optim_G.step()

for p in self.G.enc_layers.parameters():
    p.requires_grad = requires_grad_flag

The picture below shows the values during debug.

If these parameters were updated previously using an optimizer with running estimates, changing the requires_grad attribute to False might still result in updates due to the running estimates of the optimizer.
If that’s the case, you might need to set these gradients to None, so that they will be skipped, as shown here for Adam.

It works. Thanks a lot.