Discrepancy between theory and practice

Hi,

Running on current colab, this is what I see:
The same thing as you: a difference of ~1e-6 difference for float. But adding torch.set_default_dtype(torch.double) at the beginning, it goes down to ~1e-15. @InnovArul did you set it properly to double?
So it looks like it is expected loss of precision because of floating point numbers.

Also note that the deeper your network is gonna be, the larger this difference is gonna be as most operation amplify a small difference that happened at the beginning.

2 Likes