Use same gradient to maximize one part of the model and minimize another part of the same model

Mainul · September 28, 2020, 9:41am

I want to calculate the gradient and use the same gradient to minimize one part and maximize another part of the same network (kind of adversarial case). For me, Ideal case would be, if there are two optimizers responsible for two part of the network/model and one of the optimizers has a negative learning rate. But it seems that PyTorch does not allow negative learning rate.

In this case what I am doing is:

loss.backward()
optimzer_for_one_part of the model.step()

and then

(-loss).backward()

Problem is, This time the again calculated gradient will not be the same(values are different but flopped of course) because some weights of the same network (same computation graph) have already been changed. But, Ideally, I want to use the flipped version of the previous gradient.
How can I achieve this?

tom · September 28, 2020, 10:18am

As I wrote on the other thread:

Well, it seems the use case is rare enough to expect that it is an error to want negative learning rates. In the end, even the name loss suggests that you’re trying to minimize it.

But here is an easy workaround: What you could try is to set the learning rate to a negative value after initializing the optimizer ( opt.param_groups[0]['lr'] *= -1 or loop over the param_groups if you have several / pick the one you want to ascend with), preferably with a comment explaining what you are up to. But you are up to your own as to if optimizers work well with it (but SGD and Adam probably should ).

If you then do wildly successful work with it, maybe one day the restriction will be lifted.

Best regards

Thomas