Hi all,
I am trying to implement Conv3D with unfold
function. However, when I compare the results with the build-in Conv3D, they agree on CPU but NOT on CUDA. Is there a reason behind this?
You can reproduce with the following code:
import torch
import torch.nn as nn
import torch.nn.functional as F
channels = 5
h, w, d = 4, 4, 4
def test(device):
image = torch.randn(channels, h, w, d).to(device) # input image
kh, kw, kd = 3, 3, 3 # kernel size
dh, dw, dd = 1, 1, 1 # stride
# Create conv
conv = nn.Conv3d(channels, 10, (kh, kw, kd), padding='same', bias=False).to(device)
filt = conv.weight
# Manual approach
patches = F.pad(image, (1,)*6)
patches = patches.unfold(1, kh, dh).unfold(2, kw, dw).unfold(3, kd, dd)
patches = patches.contiguous().view(channels, -1, kh, kw, kd)
nb_windows = patches.size(1)
# Now we have to shift the windows into the batch dimension.
# Maybe there is another way without .permute, but this should work
patches = patches.permute(1, 0, 2, 3, 4)
# Calculate the conv operation manually
res = patches.flatten(1) @ filt.flatten(1).transpose(0, 1)
res = res.transpose(0, 1) # out_channels, output_pixels
res = res.unflatten(1, (h, w, d))
# Module approach
out = conv(image)
print('max abs error ', (out - res).abs().max())
print('Test on CPU')
test(torch.device("cpu")) # 4.7684e-07
print('Test on CUDA')
test(torch.device("cuda")) # 0.0005