Using multiple optimizers

mikayagoda · December 28, 2022, 8:39am

Hi everyone,
I need to implement the following:
Optimize on a perturbation delta (a tensor) that is added to an input to a nerual network in order to create an adversarial example for the model.
After that optimize on the model’s weights in order to make it robust to this specific input (the adversarial example).
Afterwards, I need to use the model for some downstream task.

I tried to implement that using 3 different optimizers and faced an error:
“RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [768, 512]] is at version 101; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!”

After some investigation I realized that the problem is probably the fact that the second optimizer changed the model’s weights inplace. (In general - optimizer.step() changes the optimization paramter inplace).

I also tried to use one optimizer with some optimization parameters, and changing the learning rates of the parameters that I want to remain static at certain times to 0.
This also resulted in the same error.

I saw that an option to use multiple optimizers is performing the backward step of all optimizers and only after that performing the optimizer step.
But this won’t fit in this case.

Does anyone have an idea how to overcome this issue?

Thank you!

ptrblck · December 28, 2022, 7:47pm

You might be running into a similar issue as described here. After one optimizer updates (some) parameters the corresponding forward activations become stale as they were created by the old parameter set. Trying to calculate the gradients with these stale forward activations and the updated parameters is wrong and will raise the error.
The linked post describes the issue in more detail.