I’m getting a
RuntimeError: cuFFT error: CUFFT_EXEC_FAILED when calling
torch.irfft on arrays of particular sizes and when trying to use multiple GPUs (I’m on an AWS p3.8xlarge). The test case below was distilled from my larger application, thus the particular way of determining the slices and array sizes.
import os os.environ['CUDA_LAUNCH_BLOCKING'] = "1" import torch sz = 65 n_gpus = 2 for i in range(n_gpus): arr = torch.rand((sz, sz//2 + 1, 2), dtype=torch.float32, device=i) # Get the crash torch.irfft(arr, 2, signal_sizes=(sz, sz))
n_gpus to 1, or setting
sz (array size) to most even numbers seems to avoid this error. But I would like to understand how to avoid this error for these particular settings because I don’t want someone to try my code with an array size that randomly gives this error