Irfft giving cuFFT failure for certain sized inputs

I’m getting a RuntimeError: cuFFT error: CUFFT_EXEC_FAILED when calling torch.irfft on arrays of particular sizes and when trying to use multiple GPUs (I’m on an AWS p3.8xlarge). The test case below was distilled from my larger application, thus the particular way of determining the slices and array sizes.

import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"
import torch

sz = 65
n_gpus = 2

for i in range(n_gpus):
    arr = torch.rand((sz, sz//2 + 1, 2), dtype=torch.float32, device=i)
    # Get the crash
    torch.irfft(arr, 2, signal_sizes=(sz, sz))

Setting n_gpus to 1, or setting sz (array size) to most even numbers seems to avoid this error. But I would like to understand how to avoid this error for these particular settings because I don’t want someone to try my code with an array size that randomly gives this error :slight_smile: