`torch.cuda.is_available()` allocates unwanted memory?

I want to read how much total free memory each one of my GPU devices has, so that I can automatically assign the least used device to a new process I’m launching. For this, I’m using this function:

def get_least_used_gpu():
    """Return the name of the GPU that has the most free memory.

    Returns:
        str: The name of the GPU with the least used memory, or "cpu" if no GPU is available.
    """
    if not torch.cuda.is_available():
        return "cpu"
    free_memory = [torch.cuda.mem_get_info(i)[0] for i in range(torch.cuda.device_count())]
    min_gpu = torch.argmax(torch.tensor(free_memory))
    return f"cuda:{min_gpu}"

However, I realized that this function was allocating up to additional 1.2 GB of data to all unused GPUs. After debugging, I realized it was the call torch.cuda.is_available() the one responsible for that (previously I thought it was the torch.cuda.mem_get_info() call). If I disable the if statement, only 200 MB of data are allocated to all other GPUs. What’s the reason for that?

Thank you,
Marc

I cannot reproduce the issue using a current source build and see a driver initialization using 3MB on the device. torch.cuda.is_available() will also use NVML if it’s available and will thus not call into cuInit. Also, based on the size of your context CUDA’s lazy module loading might not be used so you have either disabled it or are using an old PyTorch release.