What happend if I don't set certain parameters with `requires_grad=False` but exclude them in optimizer params?

Hello there. I want to finetune a pretrained model on my own datasets. And I found that one usually have to do is to set some parameters to not require grad. Meanwhile, I should provide the parameters needed update to optimizer. So if I don’t limit the requires_grad but exclude these parameters I do not want to update in optimizer, what will happend? Could this way achieve the same effect as the orignal way?

# original way
for np,p in model.named_parameters():
    if np in not_update:
        p.require_grad=Fasle
params = [p for p in model.parameters() if p.requires_grad]
optimizer = Adam(params,lr=0.001)

optimizer.zero_grad()
output = model(input)
loss = loss_fn(output, target)
loss.backward()
optimizer.step()


# my way
params = [ p for np,p in model.parameters() if np not in not_update]
optimizer = Adam(params,lr=0.001)
optimizer.zero_grad()
output = model(input)
loss = loss_fn(output, target)
loss.backward()
optimizer.step()

The parameters will still get their gradients calculated and accumulated in their .grad attribute. The optimizer will neither update these parameters nor will it set the .grad attribute to zero (or None).
If you never intend to update these parameters, you could ignore this issue. However, if you want to add these parameters to an optimizer later to train them, make sure to zero out their gradients first, as their .grad attributes could contain large (or even overflowing) values.

1 Like

Thank. In conclusion, the graident still accumalted, but won’t update these parameters. The original way that setting requires_grad is for saving computation cost that the parameters is removed from gard graph. Is my standpoint right?

Yes, you would save the memory, an accumulation kernel, and I would also prefer the explicit way to make sure you are not running into unwanted issues as previously described (e.g. when these parameters should be updated later).

Thanks again. The explicit way is more clear and definite.