Does DDP with torchrun need torch.cuda.set_device(device)?

Setting os.environ["CUDA_VISIBLE_DEVICES"]=os.environ["LOCAL_RANK"] still results in torch.cuda.current_device() equal to 0.

However, from this topic this behavior seems correct. It seems that by setting CUDA_VISIBLE_DEVICES directly “each process will only see one physical GPU that corresponds to its local_rank , i.e., cuda:0 in different processes will map to a different physical device”.

Is this correct?