Hi,
I had several working CNN models earlier this year, but these have now stopped working on the GPU of my machine recently and instead give a RuntimeError. PyTorch is installed on this machine via conda-forge. Although PyTorch was updated on this machine to v2.7.1 a few weeks ago, I downgraded back to v2.6.0 but the models still would not work.
These CNN models will run fine on the machine’s CPUs. Also, several fully-connected models will run fine on the CPUs and on the GPU of the same machine.
It looks to be an issue with torch.nn.Conv2d on the GPU:
>>> # sample data
>>> tc = torch.rand(1, 3, 32, 32)
>>> tg = tc.to("cuda")
>>> tc
tensor([[[[0.0561, 0.9214, 0.1704, ..., 0.5051, 0.7754, 0.6535],
[0.0134, 0.1657, 0.0521, ..., 0.0093, 0.5785, 0.1803],
[0.5017, 0.2229, 0.7540, ..., 0.7423, 0.1124, 0.1602],
...,
[0.3201, 0.4387, 0.9313, ..., 0.4546, 0.1426, 0.6919],
[0.8512, 0.6011, 0.8389, ..., 0.5267, 0.1820, 0.7977],
[0.4766, 0.8551, 0.9409, ..., 0.9309, 0.7472, 0.6058]]]])
>>> tg
tensor([[[[0.0561, 0.9214, 0.1704, ..., 0.5051, 0.7754, 0.6535],
[0.0134, 0.1657, 0.0521, ..., 0.0093, 0.5785, 0.1803],
[0.5017, 0.2229, 0.7540, ..., 0.7423, 0.1124, 0.1602],
...,
[0.3201, 0.4387, 0.9313, ..., 0.4546, 0.1426, 0.6919],
[0.8512, 0.6011, 0.8389, ..., 0.5267, 0.1820, 0.7977],
[0.4766, 0.8551, 0.9409, ..., 0.9309, 0.7472, 0.6058]]]],
device='cuda:0')
Layers for testing:
>>> # test layers
>>> f = torch.nn.Linear(3*32*32, 5000)
>>> c = torch.nn.Conv2d(3, 5, 3)
Linear layer runs fine on CPUs and GPU:
>>> f.cpu()
Linear(in_features=3072, out_features=5000, bias=True)
>>> f(tc.flatten())
tensor([ 0.2194, 0.1203, -0.1636, ..., -0.2210, 0.1886, -0.1382],
grad_fn=<ViewBackward0>)
>>>
>>> f.cuda()
Linear(in_features=3072, out_features=5000, bias=True)
>>> f(tg.flatten())
tensor([ 0.2194, 0.1203, -0.1636, ..., -0.2210, 0.1886, -0.1382],
device='cuda:0', grad_fn=<ViewBackward0>)
Conv2d layer runs fine on CPUs, but not on GPU:
>>> c.cpu()
Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1))
>>> c(tc)
tensor([[[[ 0.0431, 0.3884, 0.4317, ..., 0.1419, 0.0886, 0.2196],
[ 0.0216, 0.1064, 0.1294, ..., -0.1102, 0.1265, 0.1598],
[-0.1942, -0.2963, -0.1341, ..., 0.0938, -0.1480, 0.0014],
...,
[ 0.2905, -0.0233, 0.2386, ..., 0.0513, -0.0833, 0.1232],
[-0.0871, 0.2138, -0.1608, ..., 0.1215, -0.0795, 0.2471],
[ 0.2254, 0.2198, 0.0156, ..., 0.3266, -0.0789, -0.0808]]]],
grad_fn=<ConvolutionBackward0>)
>>>
>>> c.cuda()
Conv2d(3, 5, kernel_size=(3, 3), stride=(1, 1))
>>> c(tg)
Traceback (most recent call last):
File "<python-input-57>", line 1, in <module>
c(tg)
~^^^^
File "/home/tstravers/software/miniforge3/envs/tester/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/tstravers/software/miniforge3/envs/tester/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/home/tstravers/software/miniforge3/envs/tester/lib/python3.13/site-packages/torch/nn/modules/conv.py", line 554, in forward
return self._conv_forward(input, self.weight, self.bias)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tstravers/software/miniforge3/envs/tester/lib/python3.13/site-packages/torch/nn/modules/conv.py", line 549, in _conv_forward
return F.conv2d(
~~~~~~~~^
input, weight, bias, self.stride, self.padding, self.dilation, self.groups
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
RuntimeError: GET was unable to find an engine to execute this computation
I tried searching for the specific RuntimeError, but I don’t think it’s an out-of-memory issue for this toy Conv2d code (also, I had some larger CNN models running on the same machine earlier this year).
Output of nvidia-smi:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64.05 Driver Version: 575.64.05 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GT 1030 Off | 00000000:01:00.0 On | N/A |
| 41% 42C P8 N/A / 19W | 417MiB / 2048MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
Would appreciate any help or advice on troubleshooting this RuntimeError. Thank you!