Setting os.environ["CUDA_VISIBLE_DEVICES"]=os.environ["LOCAL_RANK"]
still results in torch.cuda.current_device()
equal to 0
.
However, from this topic this behavior seems correct. It seems that by setting CUDA_VISIBLE_DEVICES
directly “each process will only see one physical GPU that corresponds to its local_rank
, i.e., cuda:0
in different processes will map to a different physical device”.
Is this correct?