How to freeze a subset of weights of a layer?

Rodolphe_Lampe · September 25, 2020, 1:54pm

I’d like to remove a subset of connections from a Conv1d. I thought about setting the value of each connection to 0 and deactivating gradient’s computations.
I’ve seen a lot of google/forum results about deactivating a parameter (using requires_grad=False) but I want to deactivate a subset of the parameter’s weights, not all of them.
How can I do that ?

Quick example of the problem

c = torch.nn.Conv1d(3,3,5)
params = list(c.parameters())
param = params[0]
param
Out[16]: 
Parameter containing:
tensor([[[-0.1008,  0.1343,  0.1057, -0.0515, -0.0182],
         [-0.1379, -0.1431,  0.1962, -0.2398, -0.1028],
         [-0.2206,  0.0444, -0.1410, -0.2043, -0.1974]],
        [[ 0.2380,  0.1988, -0.2037, -0.0818,  0.0595],
         [ 0.1872, -0.1943,  0.1382,  0.1019, -0.2491],
         [-0.0744,  0.0346,  0.0224, -0.0474, -0.0807]],
        [[-0.1499, -0.2514, -0.2398,  0.1695,  0.1787],
         [-0.2560,  0.0071,  0.0072, -0.2318,  0.0216],
         [-0.1403,  0.0443, -0.0762,  0.0318, -0.1797]]], requires_grad=True)
param[0][0][0].requires_grad = False
param[0][0][0].requires_grad
Out[18]: True

albanD · September 25, 2020, 2:04pm

Hi,

I’m afraid the require_grad is a property of the Tensor and you cannot set it for a subset of the Tensor.

The way I would recommend is just to save the values you want to keep and set them back after the optimizer step.
Other tricks like setting the gradients to 0 can be dangerous as optimizers with weight decay for example will still update the weights when gradient is 0.

Another more complex version would be to have a custom layer that has only the learnable part as parameter. Then recreate the full weights based on the fixed values (in the forward!). Then call into the conv1d functional op with this weight.

asura · April 13, 2022, 5:57pm

Say a is a tensor, and we want to partially update only certain values within the tensor.

Then, can it be done this way:

a = torch.rand(5)
b = torch.rand(5)
print(b)
a.requires_grad = True
c = a.detach()
c.requires_grad = False
d = torch.where(b > 0.5, a, c)
y = d * 2
y.mean().backward()
print(a.grad)

The output is:

tensor([0.6207, 0.4534, 0.2759, 0.1856, 0.9388])
tensor([0.4000, 0.0000, 0.0000, 0.0000, 0.4000])

jhongtao · April 24, 2022, 10:35am

I have the same problem. How did you finally solve this problem?