Hey @Diego, the launching script will launch multiple sub-processes, which might be inherit the CUDA_VISIBLE_DEVICES
value you passed to the command line. A work around would be setting CUDA_VISIBLE_DEVICES
in main.py
before loading any cuda-related packages. Note that the recommended way to use DDP is one-process-per-device, i.e., each process should exclusively run on one GPU. If you want this, you need to set CUDA_VISIBLE_DEVICES
to a different value for each subprocess.
BTW, what’s the default CUDA_VISIBLE_DEVICES
value in your machine? I would assume the script should be able to see all devices by default if CUDA_VISIBLE_DEVICES
wasn’t set. And when the program throws RuntimeError: CUDA error: invalid device ordinal
, do you know which device it tries to access?