Outputs of F.conv2d are different when input is 5D tensor

I’m trying to do convolution for 5D tensors while I got the different outputs for the below two methods. The problem is that when I use the different batch size, sometimes the outputs are the same but sometimes not in CUDA mode. However, The two methods get the same outputs when I use CPU.

I think this may caused by CUDA memory. Anyone knows how to fix this problem?

x = torch.randn([batch_size, N, in_channels, in_width, in_height]).to("cuda")
w = torch.randn([out_channels, in_channels, k, k]).to("cuda")

# the first method
y_shape = [x.shape[0], x.shape[1]]
# merge the first two dimensions 
x_flat = x.flatten(0, 1).contiguous()  
y1 = F.conv2d(x_flat, w, None)
y_shape.extend(y1.shape[1:])
# reshape to [batch_size, N, out_channels, out_w, out_h]
y1 = y1.view(y_shape)

# the second method
for n in range(N):
    y2 = F.conv2d(x[:,n,...], w, None)
    # compare
    print(torch.equal(y1[:, n, ...], y2))

I’m using NVIDIA GTX 2080s with CUDA 11.6.
My Pytorch version is 1.9.0.

When I set the batch size to 256, they are all different. But when the batch size to 128, the outputs are all the same.
In my example,

batch_size = 256 #128
N = 16
in_channels = 3
in_width = 32
in_height = 32
out_channels = 256 
k = 3

Thanks for any help!

Hi Gary!

Check whether the discrepancy is consistent with floating-point round-off
error. (An easy way to do this is to repeat the computation after converting
your tensors to .double() which should cause round-off error to drop by
several orders of magnitude.)

Round-off errors are to be expected because changing the size of your
tensors can cause pytorch to change the specific order in which the
computations are performed, leading to results that would have been
mathematically the same but differ by round-off error.

Best.

K. Frank