Yesterday, I discovered
pynvml doesn’t respect
CUDA_VISIBLE_DEVICES, so the ids of torch.cuda and nvml don’t match if you need to change which gpus are used via
CUDA_VISIBLE_DEVICES, which breaks the software that uses
I wrote a re-mapper to solve the problem:
may be it might be of use to others.
[...] torch_gpu_id = torch.cuda.current_device() nvml_gpu_id = get_nvml_gpu_id(torch_gpu_id) handle = pynvml.nvmlDeviceGetHandleByIndex(nvml_gpu_id) [....]
I also suggested for pynvml to integrate it directly https://github.com/gpuopenanalytics/pynvml/issues/28