Why do the optimizers require positive learning rate by default?
In some cases, the loss is maximized for one network and minimized for another network. It seems ideal to set one optimizer with positive learning rate and another negative learning rate instead of forwarding the input once more to compute
This in not a solution. I am just asking again. I am also facing the same problem. If I do first
optimzer_for_one_part of the model.step()
some weights are changed and then for maximization of the another part:
This time the again calculated gradient will not be the same because some weights of the same network has already been changed.
Ideal case would be, same gradient to maximize one part and minimize another part
Well, it seems the use case is rare enough to expect that it is an error to want negative learning rates. In the end, even the name loss suggests that you’re trying to minimize it.
But here is an easy workaround: What you could try is to set the learning rate to a negative value after initializing the optimizer (
opt.param_groups['lr'] *= -1 or loop over the
param_groups if you have several / pick the one you want to ascend with), preferably with a comment explaining what you are up to. But you are up to your own as to if optimizers work well with it (but SGD and Adam probably should).
If you then do wildly successful work with it, maybe one day the restriction will be lifted.