Hi, our office has a sever and several people share these gpus.
However, I want to occupy a single card to prevent others affect my program.
the approach is to allocate all available memory at the begining and re-use these cached memory by pytorch as follows:
import os
import torch
def check_mem():
mem = os.popen('"<path\to\NVSMI>\nvidia-smi" --query-gpu=memory.total,memory.used --format=csv,nounits,noheader').read().split(",")
return mem
def main():
total, used = check_mem()
total = int(total)
used = int(used)
max_mem = int(total * 0.8)
block_mem = max_mem - used
x = torch.rand((256,1024,block_mem)).cuda()
del x
#do things here
However, above approach will surely lead to out of memory
error on my machine as follows:
RuntimeError: CUDA out of memory. Tried to allocate 1024.00 KiB (GPU 0; 3.95 GiB total capacity; 395.42 MiB already allocated; 15.38 MiB free; 2.36 GiB cached)
It is because pytorch can not re-use these cached memory.
Any one can give me some tips to solve the problem?
Thank you.