cuFFT half precision error using rfft2/irfft2

I’m using Automatic Mixed precision and within my model’s forward pass I make use of torch.fft.rfft2() and torch.fft.irfft2(). For some reason this seems to work find on my laptop, but when using a remote node via slurm throws the following error

RuntimeError: cuFFT only supports dimensions whose sizes are powers of two when computing in half precision

As far as I can tell I’m using the same versions of pytorch on each

Remote:

pytorch 2.5.1 py3.12_cuda12.4_cudnn9.1.0_0 pytorch

pytorch-cuda 12.4 hc786d27_7 pytorch

Laptop:

pytorch 2.5.1 py3.12_cuda12.4_cudnn9_0 pytorch

pytorch-cuda 12.4 h3fd98bf_7 pytorch

I could just increase the padding to next power of 2….however this drastically increases the size of the two matrices I’m then looking to multiply together.

Do you see the same error using the latest stable release?

I haven’t had chance to check that yet, as I need to create a new env on the remote node. However, I have found the latest(?) cuFFT documentation here, which states the following

Half-precision transforms have the following limitations:

:play_button: Minimum GPU architecture is SM_75

:play_button: Sizes are restricted to powers of two only

:play_button: Strides on the real part of real-to-complex and complex-to-real transforms are not supported :play_button: More than one GPU is not supported

:play_button: Transforms spanning more than 4 billion elements are not supported

This seems to be quite explicit that my issue is probably due to the second point, however, I’m now further confused as to why it seems to run fine on my laptop.

Are you sure you are using the GPU and cuFFT in your laptop environment? Could you profile the workload and check the kernel names?