Passing halve of the weights in model.fc.weight of resnet model to Adam optimizer

rahmanoladi · June 17, 2020, 12:08pm

Hi everyone,
Please I would like to pass halve of the parameters in model.fc.weight of resnet model to Adam optimizer. To be specific, suppose the weights stored in model.fc.weight is 1000 by 512, then I want to pass halve of the weights, which would be a 500 by 512 matrix, to Adam optimizer. I proceeded as follows:

w_0 = model.fc.weight[0: 501, :]
optimizer = torch.optim.Adam([w_0], lr = 0.001)
optimizer.step()

I get the following error:
ValueError: can’t optimize a non-leaf Tensor

Then I try:

w_0 = model.fc.weight[0: 501, :].retain_grad()
optimizer = torch.optim.Adam([w_0], lr = 0.001)
optimizer.step()

I get the following error:
TypeError: optimizer can only optimize Tensors, but one of the params is NoneType

surya00060 · June 18, 2020, 6:58am

Hi
If I am understanding correctly you want to pass only 50% of weights to the optimizer and not to touch the remaining 50%.

You can’t use requires_grad =True as they work only at the level of tensors.
You can achieve this by masking 50% of your weights so that they don’t get updated.

Create a new tensor operation in the computational graph

mask = torch.zeros(weight.shape, requires_grad = False)
mask[0:501,:] = 1
newWeight = weight * mask

This small stub shows how to mask neurons and train. I hope you find this useful. I can explain better if more details(ie. Code) is given.

Thanks

ptrblck · June 18, 2020, 7:24am

Small addition:
I would suggest to mask the gradients instead of the weights directly, so that they won’t be updated.
However, I think weight decay might still update these parameters even if the gradient is zero.

rahmanoladi · June 26, 2020, 9:01am

Hi Ptrblck,
Thanks for your response. I have realized that it is well worth it to learn Pytorch’s autograd inside out and bottom up. I also realized that you know the autograd engine very well. Please kindly point me to a book or tutorial that explains the autograd engine in great detail in such a way that one understands it and can predict its behavior, and then can avoid several of its pitfalls such as: “RuntimeError: leaf variable has been moved into the graph interior” and a whole host of other errors. Thanks a lot

ptrblck · June 26, 2020, 7:58pm

Hi Abdulrahman,

I’m not a real expert on the autograd engine, but I think the docs would be a good starter as well as @ezyang’s slides on PyTorch internals.

rahmanoladi · June 27, 2020, 8:35am

Hi Ptrblck,
Many thanks for the response.