Yesterday, I discovered pynvml
doesn’t respect CUDA_VISIBLE_DEVICES
, so the ids of torch.cuda and nvml don’t match if you need to change which gpus are used via CUDA_VISIBLE_DEVICES
, which breaks the software that uses pynvml
.
I wrote a re-mapper to solve the problem:
may be it might be of use to others.
Usage:
[...]
torch_gpu_id = torch.cuda.current_device()
nvml_gpu_id = get_nvml_gpu_id(torch_gpu_id)
handle = pynvml.nvmlDeviceGetHandleByIndex(nvml_gpu_id)
[....]
I also suggested for pynvml to integrate it directly https://github.com/gpuopenanalytics/pynvml/issues/28