Based on the description it seems you are trying to use stale intermediate activations to calculate the gradients for already updated parameters, which would raise this error.
This post explains the issue in more detail using a GAN training approach.