Gradient Ascent and Gradient Modification/Modifying Optimizer instead of Grad_weight

Hi All,
I have a few questions related to the topic of modifying gradients and the optimizer. I’m wondering if there is an easy way to perform gradient ascent instead of gradient descent. For example, this would correspond to replacing grad_weight by -grad_weight in linear layer definition as seen in class LinearFunction(Function): from the Extending PyTorch page. My concern here is that this will mess up a downstream function that requires grad_weight instead of -grad_weight, or is this not a concern at all? A suggestion made to me was to try to modify the optimizer. Is there a simple way to go about doing W + dW instead of W - dW in the optimizer? I can’t really tell from the source code for SGD or ADAM.

Thanks for reading!


The simplest way to do gradient ascent on a loss L is to do gradient descent on -L . :smiley:

1 Like

That is an interesting solution. I think I need to further clarify my original question. I would like to include a negative sign on the updates to the weights, and this corresponds to changing grad_weight to -grad_weight, while grad_input and grad_bias are left untouched. However, I am wary of unintended consequences of doing something like this to the gradients, and was wondering if there was an easy way to change the optimizer such that it performed gradient ascent(W + dW) for the non last layer weights specifically, but left the other parameters alone?

In that case I guess you will have to create your custom optimizer to handle that. With one group for the descent part and one group for the ascent part for example.