Discrepancy between theory and practice

Hi,

in theory, these should give the same result, but these two layers produced different (but similar) results:

import torch
from torch import nn

b=20
outc = 10
inc = 10
res = 32
input_tensor = torch.randn((b, inc, res, res))
cnn_layer = nn.Conv2d(in_channels=inc, out_channels=outc, kernel_size=res,)
out_cnn_layer = cnn_layer(input_tensor)
fc_layer = nn.Linear(in_features=inc * res * res, out_features=outc)
fc_layer.weight.data = cnn_layer.weight.data.reshape(outc, -1)
fc_layer.bias.data = cnn_layer.bias.data
out_fc_layer = fc_layer(input_tensor.view(b, -1))
assert(torch.allclose(out_cnn_layer.view(b,-1),out_fc_layer))

For me, this test passes!
pytorch 1.6.0

Did you run it multiple times? because mine is also: ‘1.6.0’

I ran 15 times (or more!) and all of them passed.
Oh wait! I missed an assert. Sorry for the misunderstanding.
I will check again.

I observe this behavior too.
I can see that the CNN and FC outputs are equal up to an absolute tolerance (atol) of 10^-5.

@albanD: do you have some comments on this?
In double data type, I would expect the absolute tolerance level should be even lesser than 10^-5. Am I missing something here?

Hi,

Running on current colab, this is what I see:
The same thing as you: a difference of ~1e-6 difference for float. But adding torch.set_default_dtype(torch.double) at the beginning, it goes down to ~1e-15. @InnovArul did you set it properly to double?
So it looks like it is expected loss of precision because of floating point numbers.

Also note that the deeper your network is gonna be, the larger this difference is gonna be as most operation amplify a small difference that happened at the beginning.

2 Likes

I see. Thanks @albanD. I did not set it to double. I assumed CPU has default type as Double.

No contrary to numpy, we default to float as it makes a big difference in terms of runtime (especially on GPU) and is enough precision for most (if not all) deep learning applications.