Now I just want to train the two Weight of linear layer, w(w1, w2, w3,…… W100), now just want to train W1, w2, how to choose these two weights to train?

Hi ,

The linear layer is actually a matrix(n_feature x n_hidden1), is not a vector. I show below how I can stop training of W[0,0].

model.hidden1.weight[0, 0].requires_grad = False

Likewise, you can make other weights to False if you do not want to learn them.

EDIT: Frank pointed a mistake in this solution. requires_grad can only be applied for a whole tensor, not on an individual element.


1 Like

Hi Pranavan (and Jin)!

No, this won’t do what you want. requires_grad applies to entire
tensors and not individual elements. And to prevent unexpected
errors, pytorch won’t let you do this.

Probably the easiest way to freeze part of a tensor is to store the
value of the element you want frozen, let the optimizer modify the
entire tensor, and then restore the original value of the element
you want frozen.


>>> import torch
>>> torch.__version__
>>> _ = torch.manual_seed (2021)
>>> lin = torch.nn.Linear (2, 3)
>>> lin.weight[0, 0].requires_grad = False
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn't require differentiation use var_no_grad = var.detach().
>>> opt = torch.optim.SGD (lin.parameters(), lr = 0.1)
>>> lin.weight
Parameter containing:
tensor([[-0.5226,  0.0189],
        [ 0.3430,  0.3053],
        [ 0.0997, -0.4734]], requires_grad=True)
>>> w00 = lin.weight[0, 0].item()
>>> lin (torch.ones (2)).sum().backward()
>>> opt.step()
>>> lin.weight
Parameter containing:
tensor([[-6.2264e-01, -8.1068e-02],
        [ 2.4303e-01,  2.0531e-01],
        [-3.3811e-04, -5.7337e-01]], requires_grad=True)
>>> with torch.no_grad():
...     lin.weight[0, 0] = w00
>>> lin.weight
Parameter containing:
tensor([[-5.2264e-01, -8.1068e-02],
        [ 2.4303e-01,  2.0531e-01],
        [-3.3811e-04, -5.7337e-01]], requires_grad=True)


K. Frank


The idea provided is very useful, the problem has been solved, thank you very much.

*The method provided is very useful, the problem has been solved, thank you very much.

Hi Frank,

It is a mistake. Thanks for pointing out.

But just have a quick question about storing the weights and retrieving them back after training. Weights w3-w100 are used for training the whole network ie, these weights influence the training of some other weights in the network. This is not intended from the original question.

The gradients should not flow through the weights from w3-w100 in an ideal training.