Why conv2d computation results are all different even if their data types are all integer?

Cheul · July 20, 2022, 5:12am

I’ve tried and compared three different methods for convolution computation with a custom kernel in Pytorch. Their results are different but I don’t understand why that is.

Setup code:

import torch
import torch.nn.functional as F

inp = torch.arange(3*500*700).reshape(1,3,500,700).to(dtype=torch.float32)
wgt = torch.ones((1,3,3,3)).to(dtype=torch.float32)
stride = 1
padding = 0
h = inp.shape[2] - wgt.shape[2] + 1
w = inp.shape[3] - wgt.shape[3] + 1

Method 1

out1 = torch.zeros((1,h,w)).to(dtype=torch.float32)
for o in range(1):
    for i in range(3):
        for j in range(h):
            for k in range(w):
                out1[o,j,k] = out1[o,j,k] + (inp[0, i, j*stride:j*stride+3, k*stride:k*stride+3] * wgt[0,i]).sum()
out1 = out1.to(dtype=torch.int)

Method 2

inp_unf = F.unfold(inp, (3,3))
out_unf = inp_unf.transpose(1,2).matmul(wgt.view(1,-1).t()).transpose(1,2)
out2 = F.fold(out_unf, (h,w), (1,1))
out2 = out2.to(dtype=torch.int)

Method 3

out3 = F.conv2d(inp, wgt, bias=None, stride=1, padding=0)
out3 = out3.to(dtype=torch.int)

And here are the results comparison:

>>> h*w
347604

>>> (out1==out2).sum().item()
327338
>>> (out2 == out3).sum().item()
344026
>>> (out1 == out3).sum().item()
330797

>>> out1.shape
(1, 498, 698)
>>> out2.shape 
(1, 1, 498, 698)
>>> out3.shape
(1, 1, 498, 698)

Their data types are all int so floating point won’t the result. When I use a squared input format such as h=500 and w=500 , all three results are all matching. But not for non-squared inputs, such as the one above with h=500 and w=700 . Any insight?

ptrblck · July 20, 2022, 6:15am

It’s not completely true that you are using int32 dtypes as the actual convolutions are performed in float32.
If you check the errors you would see that you start to diverge at the higher range and in particular you are running into the expected rounding in float32.
This Wikipedia article on Single-precision floats explains when float32 starts to round on integer values.

Interesting for you would be the range in:

Integers between 2**24=16777216 and 2**25=33554432 round to a multiple of 2 (even number)

If you check the max. values you would see that the output fits into this range:

print(out1[-1, -1, -1] > 2**24)
# tensor(True)

print(out1[-1, -1, -1].float())
# tensor(18881046.)
print(out1[-1, -1, -1].float() + 1)
# tensor(18881048.)

If you use float64 your results would match as you wouldn’t round the integer values in the output range.

Cheul · July 20, 2022, 6:28am

Thank you very much for your answer, ptrblck. This question has been bugging me for awhile haha. My suffering and pain’s gone now. Thank you again.