Using multiple optimizers

ptrblck · December 28, 2022, 7:47pm

You might be running into a similar issue as described here. After one optimizer updates (some) parameters the corresponding forward activations become stale as they were created by the old parameter set. Trying to calculate the gradients with these stale forward activations and the updated parameters is wrong and will raise the error.
The linked post describes the issue in more detail.