I’m trying to do convolution for 5D tensors while I got the different outputs for the below two methods. The problem is that when I use the different batch size, sometimes the outputs are the same but sometimes not in CUDA mode. However, The two methods get the same outputs when I use CPU.
I think this may caused by CUDA memory. Anyone knows how to fix this problem?
x = torch.randn([batch_size, N, in_channels, in_width, in_height]).to("cuda")
w = torch.randn([out_channels, in_channels, k, k]).to("cuda")
# the first method
y_shape = [x.shape[0], x.shape[1]]
# merge the first two dimensions
x_flat = x.flatten(0, 1).contiguous()
y1 = F.conv2d(x_flat, w, None)
y_shape.extend(y1.shape[1:])
# reshape to [batch_size, N, out_channels, out_w, out_h]
y1 = y1.view(y_shape)
# the second method
for n in range(N):
y2 = F.conv2d(x[:,n,...], w, None)
# compare
print(torch.equal(y1[:, n, ...], y2))
I’m using NVIDIA GTX 2080s with CUDA 11.6.
My Pytorch version is 1.9.0.
When I set the batch size to 256, they are all different. But when the batch size to 128, the outputs are all the same.
In my example,
batch_size = 256 #128
N = 16
in_channels = 3
in_width = 32
in_height = 32
out_channels = 256
k = 3
Thanks for any help!