Hot to train neck and head only in warmup

zyc · June 12, 2020, 6:17am

How to config the optimizer so that it only trains neck and head during warm up, and with backbone in training.

ptrblck · June 12, 2020, 9:29am

You could freeze the backbone parameters by setting their .requires_grad attributes to False during the warmup iterations, and then to True during the complete training.
This would make sure that these parameters won’t get any gradients and the optimizer won’t update them in its step() operation.

zyc · June 12, 2020, 9:42am

Thank you for your answer. What about add the param group of backbone to optimiser after warm up is finished? Which one is a better practice?

ptrblck · June 12, 2020, 9:51am

Both should work equally well and it might just depend on your personal coding style / preference.

flyingmrwang · July 1, 2020, 11:55am

I have the same problem, which is unsolved unfortunately
I have modelA as backbone and modelB as aux output, which means my whole model has two outputs. What I wanna do is freeze pre-trained backbone and train modelB only. But either way that you suggested would touch the weights in backbone. Any suggestion?

ptrblck · July 1, 2020, 5:05pm

If modelB is “on top” of modelA, you still could freeze the parameters of modelA and just train modelB.
Why would the modelA weights be touched?

flyingmrwang · July 2, 2020, 2:32am

Thanks for reply. Yes, it is exactly as you said. This is a piece of code, which may help to locate the problem.
(BTW, pytorch version is 1.4.0)

create graph and load weigths for modelA

checkpoint = torch.load(model_path, map_location=device)
modelA = BackboneNet().to(device)
modelA.load_state_dict(checkpoint[‘backbone’])
modelB = AuxNet().to(device)

freeze modelA

for param in modelA.parameters():
param.requires_grad = False

asign modelB to train

optimizer = torch.optim.Adam(
[{
‘params’: modelB.parameters()
}],
lr=base_lr,
weight_decay=weight_decay)

flyingmrwang · July 2, 2020, 3:24am

It seems like add model.eval() after set requires_grad works! Thanks @ptrblck for help~