I guess you might be facing a similar issue as described here and here. Could you check if you are indeed trying to use stale gradients for already updated parameters?