RuntimeError for torch.nn.Conv2D on GPU

Hi,

I had several working CNN models earlier this year, but these have now stopped working on the GPU of my machine recently and instead give a RuntimeError. PyTorch is installed on this machine via conda-forge. Although PyTorch was updated on this machine to v2.7.1 a few weeks ago, I downgraded back to v2.6.0 but the models still would not work.

These CNN models will run fine on the machine’s CPUs. Also, several fully-connected models will run fine on the CPUs and on the GPU of the same machine.

It looks to be an issue with torch.nn.Conv2d on the GPU:

>>> # sample data
>>> tc = torch.rand(1, 3, 32, 32)
>>> tg = tc.to("cuda")
>>> tc
tensor([[[[0.0561, 0.9214, 0.1704,  ..., 0.5051, 0.7754, 0.6535],
          [0.0134, 0.1657, 0.0521,  ..., 0.0093, 0.5785, 0.1803],
          [0.5017, 0.2229, 0.7540,  ..., 0.7423, 0.1124, 0.1602],
          ...,
          [0.3201, 0.4387, 0.9313,  ..., 0.4546, 0.1426, 0.6919],
          [0.8512, 0.6011, 0.8389,  ..., 0.5267, 0.1820, 0.7977],
          [0.4766, 0.8551, 0.9409,  ..., 0.9309, 0.7472, 0.6058]]]])
>>> tg
tensor([[[[0.0561, 0.9214, 0.1704,  ..., 0.5051, 0.7754, 0.6535],
          [0.0134, 0.1657, 0.0521,  ..., 0.0093, 0.5785, 0.1803],
          [0.5017, 0.2229, 0.7540,  ..., 0.7423, 0.1124, 0.1602],
          ...,
          [0.3201, 0.4387, 0.9313,  ..., 0.4546, 0.1426, 0.6919],
          [0.8512, 0.6011, 0.8389,  ..., 0.5267, 0.1820, 0.7977],
          [0.4766, 0.8551, 0.9409,  ..., 0.9309, 0.7472, 0.6058]]]],
       device='cuda:0')

Layers for testing:

>>> # test layers
>>> f = torch.nn.Linear(3*32*32, 5000)
>>> c = torch.nn.Conv2d(3, 5, 3)

Linear layer runs fine on CPUs and GPU:

>>> f.cpu()
Linear(in_features=3072, out_features=5000, bias=True)
>>> f(tc.flatten())
tensor([ 0.2194,  0.1203, -0.1636,  ..., -0.2210,  0.1886, -0.1382],
       grad_fn=<ViewBackward0>)
>>> 
>>> f.cuda()
Linear(in_features=3072, out_features=5000, bias=True)
>>> f(tg.flatten())
tensor([ 0.2194,  0.1203, -0.1636,  ..., -0.2210,  0.1886, -0.1382],
       device='cuda:0', grad_fn=<ViewBackward0>)

Conv2d layer runs fine on CPUs, but not on GPU:

>>> c.cpu()
Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1))
>>> c(tc)
tensor([[[[ 0.0431,  0.3884,  0.4317,  ...,  0.1419,  0.0886,  0.2196],
          [ 0.0216,  0.1064,  0.1294,  ..., -0.1102,  0.1265,  0.1598],
          [-0.1942, -0.2963, -0.1341,  ...,  0.0938, -0.1480,  0.0014],
          ...,
          [ 0.2905, -0.0233,  0.2386,  ...,  0.0513, -0.0833,  0.1232],
          [-0.0871,  0.2138, -0.1608,  ...,  0.1215, -0.0795,  0.2471],
          [ 0.2254,  0.2198,  0.0156,  ...,  0.3266, -0.0789, -0.0808]]]],
       grad_fn=<ConvolutionBackward0>)
>>> 
>>> c.cuda()
Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1))
>>> c(tg)
Traceback (most recent call last):
  File "<python-input-57>", line 1, in <module>
    c(tg)
    ~^^^^
  File "/home/tstravers/software/miniforge3/envs/tester/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/tstravers/software/miniforge3/envs/tester/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/tstravers/software/miniforge3/envs/tester/lib/python3.13/site-packages/torch/nn/modules/conv.py", line 554, in forward
    return self._conv_forward(input, self.weight, self.bias)
           ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tstravers/software/miniforge3/envs/tester/lib/python3.13/site-packages/torch/nn/modules/conv.py", line 549, in _conv_forward
    return F.conv2d(
           ~~~~~~~~^
        input, weight, bias, self.stride, self.padding, self.dilation, self.groups
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
RuntimeError: GET was unable to find an engine to execute this computation

I tried searching for the specific RuntimeError, but I don’t think it’s an out-of-memory issue for this toy Conv2d code (also, I had some larger CNN models running on the same machine earlier this year).

Output of nvidia-smi:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64.05              Driver Version: 575.64.05      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GT 1030         Off |   00000000:01:00.0  On |                  N/A |
| 41%   42C    P8            N/A  /   19W |     417MiB /   2048MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Would appreciate any help or advice on troubleshooting this RuntimeError. Thank you!

You could try to use an older PyTorch build using an older cuDNN version or alternatively disable cuDNN for your workload on the Pascal GPU using torch.backends.cudnn.enabled = False.

Thank you very much!

Turns out cuDNN got updated at some point from v9.10 to v9.12. Disabling the flag you mentioned would allow the latter cuDNN version to be kept and the CNN models to run again. But I decided to revert back to cuDNN v9.10 and they’re also running fine (even with updates to PyTorch v2.7.1).