How could I freeze some parts of the layer weights to zero and not the entire layer.
I tried below code, but it doesn’t freeze the specific parts(1:10 array in 2nd dimension) of the layer weights.
I am new to ML & started with Pytorch. Appreciate any help. Thanks.
for child in model_ft.children():
print(“Freezing Parameters(1->10) on the Convolution Layer”,child)
for param in child.parameters():
param.data[:,1:10,:,:].zero_()
param.data[:,1:10,:,:].requires_grad = False
I think freezing specific parts of a parameter is not possible in PyTorch because requires_grad flag is set on each Parameter(collection of weights), not each weight.
One possible approach is manually zeroing gradient before you call optimizer function.
After you calculate gradient using backward() function, call
param.grad[:, 1:10, :, :] = 0
to achieve you want.
Further, you can automatize this method using backward_hook in PyTorch.
Thanks Sonsang,
This works, I can see the required weights are set to zero after training, but the training accuracy has come down. Initializing(xavier_normal) the non-zero weights didn’t help.
Not sure if I am missing something else in my setup.
Does this approach work same as the freezing the weights to zero in the beginning?
Just to double check, how is this mathematically correct? Since the chain rule involves products (and sums) couldn’t it lead to earlier layers having gradient zero even though they should not? Consider just a simple 3 number abc and the derivative would be the product of each a’*b’*c’. Then set b’=0 yields the wrong derivative when it should be a’*b’, no?
Your suggestion works, and might really be useful in a lot of cases, but it may not have the same effect with (requires_grad=false)
It works since it sets the selected gradients to zero and ensure thats there will be no changes for some certain weights, but the gradients will still be calculated for all the weights, including the ones that have been frozen.
I am concerned about the outcome of that since I am not sure if it would create the same kind of result as setting a requires_grad flag to false.
Because I think =not sure tho= that the loss calculation might be omitting the participation of frozen parameters in the inference when we set requires_grad to false and this would create a focus on the parameters with requires_grad set to true during loss calculation and may eventually let these parameters to be penalized as they are responsible for all the losses of the network.
Please correct me if I am wrong, I know I am assuming a lot of things but, maybe these thoughts can help someone.
What does this mean, that the first 100 will be frozen? I proved this, and it’s certainly not wrong, but I don’t really understand what weights are frozen, because they are not the first 100?
Hi, Let me ask you a quick question:
When I have a model which has total 5 layer, if I want to freeze 3rd layer from the beginning layer, then the gradient of 3rd layer will affect updating 2nd and 1st layer, but the effect of 3rd layer’s gradients is consistent, and thus there won’t be problem, is my understanding correct?
Hi, I see this issue is going on for a really long time, so I wrote this simple repo in PyTorch. I hope someone may find it useful. It supports only partial freezing of Conv2d, Conv3d and Linear layers, but these are the most common anyway. If you want me to extend it to other layers, feel free to send me a message.
Hi, this does not always work as intended. It depends on the optimizer. For example, if you look at some optimizer like radam, you can see that even if the current gradient is set to zero, due to the exponential average of previous results, the parameters will still be updated (although only by a little).
To achieve exactly what we wish, I can only come up with the naive implementation where we save the parameter value before opt.step() and reset it back after.
Hi! In your case, is using a “mask” matrix and a temporal matrix possible? You record the current values of the entries you do not want to update in the “temporal matrix” and use the “mask matrix” to indicate which entries they are.