How to confirm freezing is working?

Hello all,

i’m trying to freeze all parameters of my model. “param.required grad = False” is very simple and powerful way that most of developer accept, but i failed to confirm the effect of that.

For the simplest test to check whether freezing is works, first i initailze model and assign param.requried grad as False. i run train function for the model, and visualize very few param from the model.

I expect visualized params won’t changed at all because i freeze the parameters, but the param keep changing when the loop is going on one batch to another. The belows are very psuedocode for my work, and what terminal shows.

anybody has idea??

Thank you.

model = init.model()
for param in model.parameters():
    param.required_grad = False

optimizer = torch.optim.RSMprop(model.parameters(), lr=args.lr, momentum=args.momentum,weight_decay = args.weight.decay)

model.train()
for i (inputs, target, meta) in enumerate(train_loader):
    outputs=model(inputs)
    loss = criterion(inputs,target)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    ## visualize very few params
    for param in model.parameters():
        print(param[0])
        break

At the terminal shows like

Processing |####### | (509/2225) Data: 0.052407s | Batch: 1.401s | Total: 0:09:21 | ETA: 0:32:52 | Loss: 0.0322 | Acc: 0.0236Variable containing:
(0 ,.,.) =
1.00000e-02 *
-5.1111 4.8789 7.2060 4.0981 1.5398 -1.9935 2.5382
-6.8706 -1.9746 5.5094 -1.3251 0.4653 3.0574 -0.6753
5.6140 -7.1054 6.3074 6.6501 -1.3995 6.1489 1.1402
0.1733 3.8403 2.5650 -3.3059 -1.5468 6.1374 -2.1767
8.3077 -0.2914 -7.0128 1.2817 7.1735 -6.6335 -1.9335
-1.6674 -3.0738 2.7778 -3.6561 -0.6739 3.6340 3.9714
7.2121 -0.4230 1.4884 -6.0607 2.8108 8.2189 3.3358

Processing |####### | (510/2225) Data: 0.015237s | Batch: 1.283s | Total: 0:09:22 | ETA: 0:33:58 | Loss: 0.0321 | Acc: 0.0236Variable containing:
(0 ,.,.) =
1.00000e-02 *
-5.1005 4.8870 7.2142 4.1049 1.5466 -1.9869 2.5436
-6.8591 -1.9656 5.5182 -1.3171 0.4710 3.0624 -0.6712
5.6252 -7.0960 6.3153 6.6564 -1.3954 6.1527 1.1435
0.1828 3.8488 2.5727 -3.2997 -1.5428 6.1408 -2.1743
8.3194 -0.2824 -7.0051 1.2889 7.1788 -6.6295 -1.9307
-1.6547 -3.0628 2.7874 -3.6472 -0.6671 3.6397 3.9762
7.2225 -0.4128 1.4972 -6.0499 2.8183 8.2252 3.3417

btw, training code i have is operating correctly. i try to perform experiment based on the codes.

which version of pytorch are you using?

In the version I am using (0.4.0), the optimizers do not accept the parameters with requires_grad=False. We have to manually filter out the params before passing it to the optimizer.

You can either do this in your code:

optimizer = torch.optim.RSMprop(filter(lambda p: p.requires_grad, model.parameters()), lr=args.lr, momentum=args.momentum, weight_decay = args.weight.decay)

or update the pytorch version.

Optimizer can’t take parameters with requires_grad=False. It will throw an error. But in your case, you haven’t really freezed the parameters. There is a typo in the code. It should be param.requires_grad=False not param.required_grad. Since your parameters still has requires_grad=True, it won’t throw an error when creating an optimizer. Consequently, It will also optimize the parameters.

I am surprised why it didn’t throw an error (I checked, this doesn’t throw error in 0.4.0 as well). Also, it’s RMSprop not RSMprop.

@albanD can you please have a look?

Setting required_grad does not through an error as it just creates a new attribute for the Tensor.

For optimizer only accepting params that requires_grad, I guess that depend on the optimizer but I don’t think there are any strong reason why it cannot. It might use a bit more memory than it should but for most simple optimizer that should be the worst thing that can happen.

Oh, thanks. Can you give us an example of optimizer taking freezed parameters? I couldn’t think of any.

In the latest version at least, both optim.SGD and optim.Adam accept parameters with requires_grad=False. All parameters with requires_grad=False will have 0 gradients and so with plain SGD they won’t change. Note though that if there is l2 regularization or some momentum terms already saved for these parameters, they will change! Having requires_grad=False only means from the optimizer point of view that the gradients will be 0. The parameters might still be changed.

3 Likes

i’m noticed that 0.2.0 is also generate error when put a model whose all parameters have “requires_grad=False”.
and
the way you introduced(using “filter”) really helps for me. Thank you very much.

Exactly. Thank you for finding a mistake i made.

Yeb. “required_grad” doesn’t make an error. OH MY GOD. You comments are really help to understand what it really mean “requires_grade=False”

Thank you

Hi @albanD, I am currently using PyTorch v1.9.0.
Could you please tell me why is that the case if the optimizer involves weight decay, then the freezed parameters would still change? I am facing exact same problem, where part of the network is supposed to be freezed (say the encoder), and the rest of the network has to be trained, but with weight decay ( I am using 0.01 in my case) with Adam.

To do so, before starting the training I just did -

for p in model.feat.parameters():
	p.requires_grad=False

where I intend to freeze the parameters of model.feat, and then I instantiate the optimizer as follows:

optim = torch.optim.Adam(model.parameters(), lr = args['lr'], weight_decay=args['weight_decay'])

Thanks!!

Hi,

Weight decay applies a constant update to the parameters that is independent to the learning rate and gradient. So it will change them whatever the values of lr/grad.

Thanks for the clarification!