You are most likely running into this issue which fails to compute the gradients since the forward activations are stale after a parameter update.
1 Like
You are most likely running into this issue which fails to compute the gradients since the forward activations are stale after a parameter update.