i’m trying to freeze all parameters of my model. “param.required grad = False” is very simple and powerful way that most of developer accept, but i failed to confirm the effect of that.
For the simplest test to check whether freezing is works, first i initailze model and assign param.requried grad as False. i run train function for the model, and visualize very few param from the model.
I expect visualized params won’t changed at all because i freeze the parameters, but the param keep changing when the loop is going on one batch to another. The belows are very psuedocode for my work, and what terminal shows.
anybody has idea??
Thank you.
model = init.model()
for param in model.parameters():
param.required_grad = False
optimizer = torch.optim.RSMprop(model.parameters(), lr=args.lr, momentum=args.momentum,weight_decay = args.weight.decay)
model.train()
for i (inputs, target, meta) in enumerate(train_loader):
outputs=model(inputs)
loss = criterion(inputs,target)
optimizer.zero_grad()
loss.backward()
optimizer.step()
## visualize very few params
for param in model.parameters():
print(param[0])
break
In the version I am using (0.4.0), the optimizers do not accept the parameters with requires_grad=False. We have to manually filter out the params before passing it to the optimizer.
Optimizer can’t take parameters with requires_grad=False. It will throw an error. But in your case, you haven’t really freezed the parameters. There is a typo in the code. It should be param.requires_grad=False not param.required_grad. Since your parameters still has requires_grad=True, it won’t throw an error when creating an optimizer. Consequently, It will also optimize the parameters.
I am surprised why it didn’t throw an error (I checked, this doesn’t throw error in 0.4.0 as well). Also, it’s RMSprop not RSMprop.
Setting required_grad does not through an error as it just creates a new attribute for the Tensor.
For optimizer only accepting params that requires_grad, I guess that depend on the optimizer but I don’t think there are any strong reason why it cannot. It might use a bit more memory than it should but for most simple optimizer that should be the worst thing that can happen.
In the latest version at least, both optim.SGD and optim.Adam accept parameters with requires_grad=False. All parameters with requires_grad=False will have 0 gradients and so with plain SGD they won’t change. Note though that if there is l2 regularization or some momentum terms already saved for these parameters, they will change! Having requires_grad=False only means from the optimizer point of view that the gradients will be 0. The parameters might still be changed.
i’m noticed that 0.2.0 is also generate error when put a model whose all parameters have “requires_grad=False”.
and
the way you introduced(using “filter”) really helps for me. Thank you very much.
Hi @albanD, I am currently using PyTorch v1.9.0.
Could you please tell me why is that the case if the optimizer involves weight decay, then the freezed parameters would still change? I am facing exact same problem, where part of the network is supposed to be freezed (say the encoder), and the rest of the network has to be trained, but with weight decay ( I am using 0.01 in my case) with Adam.
To do so, before starting the training I just did -
for p in model.feat.parameters():
p.requires_grad=False
where I intend to freeze the parameters of model.feat, and then I instantiate the optimizer as follows:
optim = torch.optim.Adam(model.parameters(), lr = args['lr'], weight_decay=args['weight_decay'])
Weight decay applies a constant update to the parameters that is independent to the learning rate and gradient. So it will change them whatever the values of lr/grad.