I’m currently working on face recognition project which needs to write multiple loss functions.
For the first loss, I use optim.SGD(model.parameters(),…) with cross-entropy loss, the parameters of the network will be autograded.
For the second loss, I would like to use my own function which should first sends the gradient to the last fully connected layer(model.classifier) and also backpropagate the gradients to all the other layers, should I also use optim.SGD(model.parameters(),…) for this loss function? or I can use optim.SGD(model.classifier.parameters(),…) instead?
Thanks in advance!!
If you use
optim.SGD(model.classifier.parameters(),…), the optimizer step will only optimize your last layer, but the other ones will be optimized by the other optimizer step.
I think you don’t have to use a second optimizer at all. You should be able to do. Something like this:
(loss_1 + loss_2).backward()
Thanks for your reply! But if I have to set two different learning rate then I need different optimizer, right?
And I’m curious that if I use two optimizer, will the performance be different?
Depending on your use case you could either use different param_groups within one optimizer or a second optimizer.
If done correctly, the performance should not differ although each param should only be optimized by one optimizer, as they could have gradients pointing to different directions for different objectives, which could cause an oscillation without proper convergence.