In “second forward-backward pass” you are still using the loss
value which was calculated by criterion(scores, labels)
and try to calculate the gradients using its stored forward activations again.
However, between the first loss.backward()
call and the second one you’ve already updated (some) parameters, which would make the forward activations stale. Trying to use these stale forward activations and the already updated parameters is wrong and thus the error is raised.
This post describes the issue in more detail.