Hi PyTorch Forum,
I have access to a server with a NVIDIA K80. Problem is, there are about 5 people using this server alongside me. Most of the others use Tensorflow with standard settings, which means that their processes allocate the full gpu memory at startup.
I use PyTorch, which dynamically allocates the memory it needs to do the calculation.
Here the problem scenario:
1.) I start my process, which will be running for about 7 days.
2.) Two days later somebody decides to start his tensorflow process.
3.) If my process needs more memory for some calc it will raise the cuda out of mem exception, cause the other process has allocated all of the free memory left…
So I was thinking: Is there a way to reserve lets say 3 GB GPU memory for my process which PyTorch can use dynamically, while the other users see my process consuming this space permanently?
Edit: My best idea so far is a little trick:
On starting you init 6 variables each containing 0.5 GB of random data. Each time you get an out of mem exception you delete one of these variables and try again where u got the exception. In that way u got like a
3 GB buffer.
But this kinda solutiuon ist really ugly in my optinon…
Ok, I found a solution that works for me:
On startup I measure the free memory on the GPU, take 80% of that, create a variable this big and put it on GPU. Directly after doing that, I override it with a small value.
While the process is running, the GPU has still 80% memory blocked and pytorch is using this space.
mem = os.popen('"<path\to\NVSMI>\nvidia-smi" --query-gpu=memory.total,memory.used --format=csv,nounits,noheader').read().split(",")
total, used = check_mem()
total = int(total)
used = int(used)
max_mem = int(total * 0.8)
block_mem = max_mem - used
x = torch.rand((256,1024,block_mem)).cuda()
x = torch.rand((2,2)).cuda()
#do things here
First of all, this is more of a cluster sharing problem from my point of view than a real need.
Anyway, your solution to allocate a tensor then delete it will work because the caching allocator will keep the memory around for the next allocations. You don’t need to replace it, you can only do
del x just after creating it.
Be aware that this can have some side effect of possibly increasing the overall memory usage of your program and that as soon as your program will be close to run out of memory, the allocator will free all unused memory and your “memory pool” will be gone.
Thank you for this hint!
I agree with you, that this isnt PyTorch’ matter. Sadly our server support is really slow, so i needed some workaround
@albanD Thank you so much.
I adopted above code
os.popen('"<path\to\NVSMI>\nvidia-smi" --query-gpu=memory.total,memory.used --format=csv,nounits,noheader').read().split(",") to query single specific GPU card when several cards available.
deviceid = 1
os.environ['CUDA_VISIBLE_DEVICES'] = "%d"%deviceid
total, used = os.popen(
'"nvidia-smi" --query-gpu=memory.total,memory.used --format=csv,nounits,noheader'
total = int(total)
used = int(used)
print(deviceid, 'Total GPU mem:', total, 'used:', used)
I found above code will lead to out of memory error as :
RuntimeError: CUDA out of memory. Tried to allocate 1024.00 KiB (GPU 0; 3.95 GiB total capacity; 231.38 MiB already allocated; 6.25 MiB free; 2.52 GiB cached)
Why can not the pytorch re-use the cache memory?
As I mentionned above, this is a hack to try and prentent the memory is used. You should not need to do it and it can have side effects because this is not what the allocator is made for!
In particular, if any allocation fails due to fragmentation, we dealloc and realloc memory to reduce fragementation. But since the whole memory was allocated at once, this is not possible after this hack and so it will OOM even though it will work without it.
Here again, @dragen why do you need to do this?