I observe this behavior too.
I can see that the CNN and FC outputs are equal up to an absolute tolerance (atol) of 10^-5.
@albanD: do you have some comments on this?
In double data type, I would expect the absolute tolerance level should be even lesser than 10^-5. Am I missing something here?
Running on current colab, this is what I see:
The same thing as you: a difference of ~1e-6 difference for float. But adding torch.set_default_dtype(torch.double) at the beginning, it goes down to ~1e-15. @InnovArul did you set it properly to double?
So it looks like it is expected loss of precision because of floating point numbers.
Also note that the deeper your network is gonna be, the larger this difference is gonna be as most operation amplify a small difference that happened at the beginning.
No contrary to numpy, we default to float as it makes a big difference in terms of runtime (especially on GPU) and is enough precision for most (if not all) deep learning applications.