How to config the optimizer so that it only trains neck and head during warm up, and with backbone in training.
You could freeze the backbone parameters by setting their .requires_grad
attributes to False
during the warmup iterations, and then to True
during the complete training.
This would make sure that these parameters won’t get any gradients and the optimizer won’t update them in its step()
operation.
Thank you for your answer. What about add the param group of backbone to optimiser after warm up is finished? Which one is a better practice?
Both should work equally well and it might just depend on your personal coding style / preference.
I have the same problem, which is unsolved unfortunately
I have modelA as backbone and modelB as aux output, which means my whole model has two outputs. What I wanna do is freeze pre-trained backbone and train modelB only. But either way that you suggested would touch the weights in backbone. Any suggestion?
If modelB
is “on top” of modelA
, you still could freeze the parameters of modelA
and just train modelB
.
Why would the modelA
weights be touched?
Thanks for reply. Yes, it is exactly as you said. This is a piece of code, which may help to locate the problem.
(BTW, pytorch version is 1.4.0)
create graph and load weigths for modelA
checkpoint = torch.load(model_path, map_location=device)
modelA = BackboneNet().to(device)
modelA.load_state_dict(checkpoint[‘backbone’])
modelB = AuxNet().to(device)
freeze modelA
for param in modelA.parameters():
param.requires_grad = False
asign modelB to train
optimizer = torch.optim.Adam(
[{
‘params’: modelB.parameters()
}],
lr=base_lr,
weight_decay=weight_decay)