Convert Conv3d to Conv2d but outputs different

I tried to convert a Conv3d to Conv2d by splitting 3D kernels by the temporal axis, but outputs from the two differ:
image
I expect output1 and output2 should be the same. Any details I’ve overlooked?

code:

t = torch.randn(1, 1, 4, 4, dtype=torch.float32)
conv3d = torch.nn.Conv3d(out_channels=1, in_channels=1, kernel_size=(2,2,2), stride=(2,2,2), bias=False, dtype=torch.float32)
# conv3d2 = torch.nn.Conv3d(out_channels=1, in_channels=1, kernel_size=(2,2,2), stride=(2,2,2), bias=False, dtype=torch.bfloat16)
# conv3d.weight.data = torch.ones(1, 1, 2, 2, 2, dtype=torch.bfloat16)
# conv3d2.weight.data = conv3d.weight.data
conv2d1 = torch.nn.Conv2d(in_channels=1, out_channels=1, kernel_size=(2,2), stride=(2,2), bias=False, dtype=torch.float32)
conv2d2 = torch.nn.Conv2d(in_channels=1, out_channels=1, kernel_size=(2,2), stride=(2,2), bias=False, dtype=torch.float32)
# conv2d3 = torch.nn.Conv2d(in_channels=3, out_channels=12, kernel_size=(14,14), stride=(14,14), bias=False)
# print(conv3d.weight.shape)
# conv2d1.weight = torch.nn.parameter.Parameter(conv3d.weight.detach().reshape(12, 6, 14, 14).contiguous())
# conv2d3.weight = torch.nn.parameter.Parameter(torch.mean(conv3d.weight.detach(), dim=2).squeeze(dim=2))

print(conv3d.weight.data.dtype)
conv2d1.weight.data = conv3d.weight.data[:,:, 0, :, :]
conv2d2.weight.data = conv3d.weight.data[:,:, 1, :, :]


input = torch.stack([t, t], dim=0).view(-1, 1, 2, 2, 2)
# print(input.shape)
output1 = conv3d(input).view(-1, 1)
output2 = (conv2d1(input[:, :, 0, :, :]) + conv2d2(input[:, :, 1, :, :])).view(-1, 1)
# output3 = conv3d2(input).view(-1, 1)
print(output1==output2)

Versions

PyTorch version: 2.4.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 15.0.1 (arm64)
GCC version: Could not collect
Clang version: 16.0.0 (clang-1600.0.26.3)
CMake version: version 3.28.1
Libc version: N/A

Python version: 3.10.13 (main, Sep 11 2023, 08:16:02) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-15.0.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M3 Max

Versions of relevant libraries:
[pip3] numpy==1.26.3
[pip3] torch==2.4.1
[pip3] torchaudio==2.4.1
[pip3] torchvision==0.19.1
[conda] numpy 1.26.3 pypi_0 pypi
[conda] torch 2.4.1 pypi_0 pypi
[conda] torchaudio 2.4.1 pypi_0 pypi
[conda] torchvision 0.19.1 pypi_0 pypi

Your image is not displayed for me, but keep in mind that small numerical mismatches would be expected if different kernels are selected for the different workloads.
Thus check the .abs().max() error in your comparison instead of assuming equal results.

1 Like