The torch allclose function on the grad value and the hand-calculated grad values are very far away on colab. However, the issue was gone when the same code ran locally.
For more detail, you can see here.
These small numerical mismatches are usually expected due to the limited numerical precision. Are you checking your manual gradient calculation with the built-in gradcheck util.?
But the problem is this only happens on colab, not in the local. This is weird.
For the gradcheck function, according to the doc, it said it also uses allclose() function too, " The check between numerical and analytical gradients uses allclose()."