You are running into this issue since you are trying to use stale forward activations, which would result in a wrong gradient computation.
You are running into this issue since you are trying to use stale forward activations, which would result in a wrong gradient computation.