I am building a model which consists of a subnet and some layers followed, the subnet has been trained before, and now I want to train the following new layers with the parameters of the subnet fixed.
I want to know how to remove the parameters from the model.parameters() when optimizing the net using Adam.
Any advice would be appreciated !!!
The transfer learning tutorial has a relevant section:
To summarize:

Set
requires_grad=False
for all parameters you do not wish to optimizer. This avoids computing gradients for them:for param in base_model.parameters(): param.requires_grad = False

Call
.parameters()
on the part of the subnetwork:optim.Adam(model.sub_network.parameters(), ...)
If your new layers arenâ€™t entirely contained in a single Module, you can collect parameters by using list concatenation:
parameters = []
parameters.extend(model.new_layer1.parameters())
parameters.extend(model.new)layer2.parameters())
optimizer = optim.Adam(parameters, ...)
Thank you very much, it works
Hi, maybe itâ€™s silly, but if I have two sub_network, say netA and netB, the parameters of netA are put into optim.Adam (optim.Adam(model.netA.parameters())) while those of netB are not. Then what would happen to the netBâ€™s parameters? Thanks!
The parameters update when you call optimizer.step()
. Since your donâ€™t have a optimizer for the parameters of netB, I think they wonâ€™t change.
Is it necessary to set .requires_grad = False
? If we simply not provide those layers to the optimizer, would it work? I understand we would free up memory by not computing gradient but is it necessary?
While optimizer.step()
wouldnâ€™t update these parameters (since youâ€™ve never passed them to the optimizer), these parameters would still accumulate the gradients.
This is of course wasteful, as these gradients are not needed (and Autograd could potentially stop the backward pass before reaching these parameters). Additionally to that, you would have to be careful, if you are planning to update these parameters in the future, e.g. by adding these parameters via optimizer.add_param_group
, since they would already contain (large) gradients.
Hi may I ask if I set some intermediate layerâ€™s weight, require_grad = false, will the gradients still be able to backpropagate through the intermediate layers to the front layers so they get proper updates?
Yes, this will work as seen here:
# setup
model = nn.Sequential(
nn.Linear(1, 1),
nn.Linear(1, 1),
nn.Linear(1, 1)
)
# freeze middle layer
for param in model[1].parameters():
param.requires_grad = False
# calculcate gradients
model(torch.randn(1, 1)).backward()
# check gradients
for name, param in model.named_parameters():
print(name, param.grad)
> 0.weight tensor([[0.0005]])
0.bias tensor([0.0335])
1.weight None
1.bias None
2.weight tensor([[0.6302]])
2.bias tensor([1.])