I want to read how much total free memory each one of my GPU devices has, so that I can automatically assign the least used device to a new process I’m launching. For this, I’m using this function:
def get_least_used_gpu():
"""Return the name of the GPU that has the most free memory.
Returns:
str: The name of the GPU with the least used memory, or "cpu" if no GPU is available.
"""
if not torch.cuda.is_available():
return "cpu"
free_memory = [torch.cuda.mem_get_info(i)[0] for i in range(torch.cuda.device_count())]
min_gpu = torch.argmax(torch.tensor(free_memory))
return f"cuda:{min_gpu}"
However, I realized that this function was allocating up to additional 1.2 GB of data to all unused GPUs. After debugging, I realized it was the call torch.cuda.is_available()
the one responsible for that (previously I thought it was the torch.cuda.mem_get_info()
call). If I disable the if
statement, only 200 MB of data are allocated to all other GPUs. What’s the reason for that?
Thank you,
Marc