How do I freeze the specific weights in a layer?

Drezal_Loh · December 1, 2020, 8:45am

I am new to PyTorch.
I am trying to freeze specific weights in my model. How do I do this?
Here is the output of param.grad

for param in model.parameters():
param.grad

tensor([[ 1.2880e+00, 1.9992e+00, -1.8617e-01, 9.7155e-04, 6.1021e-02,
-1.6990e-01, -1.6990e-01],
[ 1.2238e+00, 1.6739e+00, -1.5423e-01, 6.8004e-04, 4.8646e-02,
-1.3877e-01, -1.3877e-01],
[-1.4395e-01, 2.6961e-01, -3.0931e-01, 1.2276e-03, 4.1116e-02,
-9.4576e-02, -9.4576e-02],
[-1.4087e+00, -1.7222e+00, -1.0639e-01, 1.6479e-04, -1.9364e-02,
7.0537e-02, 7.0537e-02],
[-3.8626e+00, -5.9818e+00, -7.5635e-01, 1.4528e-04, -8.4420e-02,
2.8996e-01, 2.8996e-01],
[ 4.5561e-01, 2.6636e+00, 3.0582e-01, 1.1337e-03, 8.0234e-02,
-2.2834e-01, -2.2834e-01],
[ 9.2553e+00, -2.3199e+00, -4.4024e+00, 3.2237e-03, 1.1196e-01,
-2.6179e-01, -2.6179e-01],
[ 5.0038e-01, -2.0895e-01, -3.9626e-01, 4.9325e-04, 2.0533e-02,
-5.1361e-02, -5.1361e-02]])
tensor([[-0.0604, -0.0332, -0.1097, 0.1066, 0.0263, -0.0495, -0.0896, -0.0305],
[-0.0236, -0.0129, -0.0855, 0.0561, 0.0050, -0.0431, -0.0806, -0.0351],
[-0.0619, 0.0210, -0.0466, 0.1049, 0.0903, 0.0441, -0.0294, 0.0410],
[ 0.0039, 0.0439, 0.0502, -0.0047, 0.0523, 0.0802, 0.0768, 0.0853],
[ 0.1193, 0.1690, -0.0268, -0.0964, 0.0923, 0.1122, 0.3309, 0.3556],
[-0.1557, -0.0918, 0.0501, 0.1837, 0.1018, 0.0462, -0.1316, -0.0694],
[ 0.2985, 0.2872, -0.2841, -0.2909, -0.2793, -0.2900, 0.0360, -0.2768],
[ 0.0167, 0.0288, -0.0294, -0.0127, -0.0085, -0.0154, 0.0204, 0.0048]])
tensor([-0.1699, -0.1388, -0.0946, 0.0705, 0.2900, -0.2283, -0.2618, -0.0514])
tensor([-0.1699, -0.1388, -0.0946, 0.0705, 0.2900, -0.2283, -0.2618, -0.0514])
tensor([[-3.5904e+00, -3.2880e+00, 3.6735e+00, 3.5596e+00, 3.4067e+00,
3.9122e+00, 1.3480e+00, 3.1120e+00],
[ 1.0125e+00, 1.0352e+00, -3.3861e-01, -1.0392e+00, -2.5707e-02,
3.5781e-03, 2.3016e+00, 1.6572e+00],
[ 1.4970e+00, 1.7948e+00, -1.7893e+00, -1.2388e+00, -7.7342e-01,
-1.2879e+00, 4.3067e-01, -1.0760e-01],
[-6.6468e-02, -1.3030e-01, 1.3785e-01, 1.6426e-02, -2.2650e-02,
4.3137e-02, -3.3531e-02, -8.2672e-02],
[-3.2205e-02, -4.9927e-02, 8.5638e-02, 1.8152e-02, 4.9936e-02,
7.4584e-02, 1.0584e-01, 1.1505e-01]])
tensor([ 3.7052, 1.1885, -1.4792, 0.1808, 0.1393])

The particular weights I wish to freeze(set grad to zero) is the one in the 1st tensor, [:,5:].

Thanks

Alexey_Demyanchuk · December 1, 2020, 10:56am

Hey. It is not 100% clear for me what are you trying to achieve, but I answer as far as I understand.
Pytorch weights tensors all have attribute requires_grad. If set to False weights of this ‘layer’ will not be updated during optimization process, simply frozen.
You can do it in this manner, all 0th weight tensor is frozen:

for i, param in enumerate(m.parameters()):
    if i == 0:
        param.requires_grad = False

I am not aware of the method how you can do requires_grad = False for the slice of the weights. At least I can’t do it without pytorch complaining.

Anyway, you can zero some slice of the gradients before optimization step, so this exact slice of weights don’t changed after optimization step. Here is a dummy example:

import torch
m = torch.nn.Linear(4, 2)
opt = torch.optim.Adam(m.parameters())
x = torch.rand((1,4))
y = torch.tensor([[0, 1]], dtype=torch.float32)
crit = torch.nn.BCEWithLogitsLoss()

out = m(x)
loss = crit(y, out)
loss.backward()

for i, param in enumerate(m.parameters()):
    if i == 0:
        param.grad[:,1:] = torch.zeros_like(param.grad[:,1:])

opt.step()
print('after optimizer step')
for param in m.parameters():
    print(param)

After doing this, I can see that weights tensor[0] slice[:,1:] didn’t change after optimizer step.

Hope it helps

KFrank · December 1, 2020, 6:34pm

Hi Alexey and Drezal!

As Alexey notes, you can’t apply requires_grad to only part of a
tensor.

Please note that this approach – zeroing out parts of .grad before
calling opt.step() – doesn’t work in general. Some optimizers (e.g.,
when using momentum or weight decay) will change the weights
even if .grad is zero.

The more general approach will be to copy the weights you want
frozen, call opt.step() and then rewrite the frozen weights back
into your param.

Best.

K. Frank

Drezal_Loh · December 2, 2020, 2:01am

Dear Alexey and KFrank,

I have tried both methods and they both worked!
Thank you for noting that zeroing out parts of .grad sometimes will not work.

I am using MSELoss() with ADAM Optimizer.

Regards,
Eric

Alexey_Demyanchuk · December 2, 2020, 10:16am

Thanks, this is really important note!

sakimarquis · April 11, 2022, 10:40am

That’s a super important notes! The adam has the momentum part, and it indeed doesn’t work well. Is there any way to clear up all the historical gradient stuff?

ndvbd · August 12, 2023, 9:56pm

What about multiplying the tensor with a constant (requires_grad=False) mask?