I am attempting to use a package that employs pytorch and I keep getting errors when asking it to select GPUs 0 and 1, saying there are not that many GPUs.
CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1670525552411/work/aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch.
The script tries to set GPUs via the following line of code, where gpu_list is ‘0,1’
os.environ['CUDA_VISIBLE_DEVICES'] = gpu_list --> gpu_list = '0,1'
I have noted that the cluster I am using uses MIGs. How could I set CUDA_VISIBLE_DEVICES to multiple MIGs for a single script? I noted the following thread, but it applies to using MIGs on parallel jobs rather than together: Access GPU partitions in MIG