Making pynvml match torch device ids (CUDA_VISIBLE_DEVICES)

Yesterday, I discovered pynvml doesn’t respect CUDA_VISIBLE_DEVICES, so the ids of torch.cuda and nvml don’t match if you need to change which gpus are used via CUDA_VISIBLE_DEVICES, which breaks the software that uses pynvml.

I wrote a re-mapper to solve the problem:

may be it might be of use to others.

Usage:

    [...]
    torch_gpu_id = torch.cuda.current_device()
    nvml_gpu_id = get_nvml_gpu_id(torch_gpu_id)
    handle = pynvml.nvmlDeviceGetHandleByIndex(nvml_gpu_id)
    [....]

I also suggested for pynvml to integrate it directly https://github.com/gpuopenanalytics/pynvml/issues/28

1 Like