You might be trying to calculate the gradients using stale forward activations with already updated parameters as described in this post.
You might be trying to calculate the gradients using stale forward activations with already updated parameters as described in this post.