Hi, our office has a sever and several people share these gpus.
However, I want to occupy a single card to prevent others affect my program.
the approach is to allocate all available memory at the begining and re-use these cached memory by pytorch as follows:
import os import torch def check_mem(): mem = os.popen('"<path\to\NVSMI>\nvidia-smi" --query-gpu=memory.total,memory.used --format=csv,nounits,noheader').read().split(",") return mem def main(): total, used = check_mem() total = int(total) used = int(used) max_mem = int(total * 0.8) block_mem = max_mem - used x = torch.rand((256,1024,block_mem)).cuda() del x #do things here
However, above approach will surely lead to
out of memory error on my machine as follows:
RuntimeError: CUDA out of memory. Tried to allocate 1024.00 KiB (GPU 0; 3.95 GiB total capacity; 395.42 MiB already allocated; 15.38 MiB free; 2.36 GiB cached)
It is because pytorch can not re-use these cached memory.
Any one can give me some tips to solve the problem?