You could be hitting this issue which would be raised in case a backward pass tries to compute gradients with already updated parameters and thus also stale forward activations.
You could be hitting this issue which would be raised in case a backward pass tries to compute gradients with already updated parameters and thus also stale forward activations.