Update weight with same netowork's output

It seems your code tries to calculate the gradients in the second backward pass using “stale” intermediate forward activations, since the parameters were already updated, which is wrong. This post explains it in more detail.