CUDA_VISIBLE_DEVICES is no longer valid after calling
torch.cuda.device_count(). It seems like these two functions will freeze
CUDA_VISIBLE_DEVICES after the first call. Is this an intended behavior or bug? I found this behavior caused some trouble in torch_xla with multi-processing as discussed here Calling torch.cuda.is_available() with multiprocessing exhausts memory. · Issue #3347 · pytorch/xla · GitHub.
As explained in the linked issue,
CUDA_VISIBLE_DEVICES has to be set before the first CUDA call.