CUDA_VISIBLE_DEVICES is no longer valid after calling torch.cuda.is_available()

ymwangg · March 3, 2022, 5:56pm

I found CUDA_VISIBLE_DEVICES is no longer valid after calling torch.cuda.is_available() or torch.cuda.device_count(). It seems like these two functions will freeze CUDA_VISIBLE_DEVICES after the first call. Is this an intended behavior or bug? I found this behavior caused some trouble in torch_xla with multi-processing as discussed here Calling torch.cuda.is_available() with multiprocessing exhausts memory. · Issue #3347 · pytorch/xla · GitHub.

ptrblck · March 3, 2022, 8:23pm

As explained in the linked issue, CUDA_VISIBLE_DEVICES has to be set before the first CUDA call.