Why is 2D conv slower than equivalent 1D conv?

I have two networks and one of them uses 2D conv layers where one of the kernel dims is size 1, and the other just uses 1D conv, but is equivalent.

So why does the latter run faster if they are mathematically the same?

Could you share a minimal, executable code snippet to reproduce the slowdown as well as the output of python -m torch.utils.collect_env, please?

If you are using the GPU (and thus cuDNN for convolutions) both calls should be dispatched to the same internal cuDNN call, so I’m unsure what might be causing it (unless some use cases were filtered out due to known issues).

Oh I’m running it on my macbook’s cpu.

I haven’t tested this one exactly, but maybe the following:

model = nn.Sequential(nn.Conv1d(1, 8, kernel_size=5, stride=1),
            nn.BatchNorm1d(8),
            nn.Dropout(0.2),
            nn.ReLU(),
            nn.MaxPool1d(kernel_size=2, stride=2))

vs.

model = nn.Sequential(nn.Conv2d(1, 8, kernel_size=(5,1), stride=1),
            nn.BatchNorm2d(8),
            nn.Dropout(0.2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=(2,1), stride=2))

Output:

Collecting environment information...
PyTorch version: 1.8.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 12.4 (arm64)
GCC version: Could not collect
Clang version: 13.1.6 (clang-1316.0.21.2.5)
CMake version: version 3.21.3

Python version: 3.9 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.2
[pip3] pytorch-lightning==1.4.8
[pip3] torch==1.10.2
[pip3] torchaudio==0.9.1
[pip3] torchinfo==1.7.0
[pip3] torchmetrics==0.5.1
[pip3] torchsampler==0.1.1
[pip3] torchsummary==1.5.1
[pip3] torchvision==0.11.3
[conda] numpy                     1.21.2           py39h1f3b974_0    conda-forge
[conda] pytorch                   1.8.0           cpu_py39hc766e51_1    conda-forge
[conda] pytorch-lightning         1.4.8                    pypi_0    pypi
[conda] torch                     1.10.2                   pypi_0    pypi
[conda] torchaudio                0.9.1                    pypi_0    pypi
[conda] torchinfo                 1.7.0              pyhd8ed1ab_0    conda-forge
[conda] torchmetrics              0.5.1                    pypi_0    pypi
[conda] torchsampler              0.1.1                    pypi_0    pypi
[conda] torchsummary              1.5.1                    pypi_0    pypi
[conda] torchvision               0.9.0a0+83171d6          pypi_0    pypi

Thanks for the update. I’m unfortunately not familiar enough with the backend on MAC CPU workloads, so we would need to wait for some experts for this architecture.

Oh, if it’s mac CPU specific then I would consider it a non-issue. This slowdown wouldn’t occur on CUDA GPUs?

It should not, but you would need to profile it using your setup (GPU, PyTorch version, CUDA + cuDNN + cublas version etc.) as the performance would depend on the used environment.

Thanks! I’ll get back to you if I still find it slowed-down