How to occupy all the gpu memory at the beginning of training

In tensorflow, we can occupy all the memory at the beginning of training, I’m wondering if it is possible to occupy all the gpu memory at the beginning of training when using pytorch?



There’s little to no advantage of allocating memory before hand. tf uses static graph, so it can allocate do so. But there is little gain from it.

@SimonW , Thanks. If we run the model in a server, then I found pytorch based programs are easily killed due to out of memory during training (I can run the code at the beginning, while it got killed after several hundreds iterations). So I’d like to occupy enough memory at the beginning, as tf based programs rarely get killed because of memory issue during training…

You are probably doing something wrong then. In usual cases, memory usage shouldn’t increase as you run more iterations. This is good starting point to debug:

@SimonW I donot mean the GPU memory goes up after iterations’ running. As I manually release the GPU memory during training, so the GPU memory goes up and down during training, when my memory occupation is low, other users begin to run their codes, and then my program is killed because of memory issue. And that’s why I like to know if we can occupy all the memory or not at the beginning of training.


First of all, usually a better solution would be to have a proper tasks scheduler on your server/cluster. One that assigns (amongst other computation resources) GPUs to a given task.
One convention can be that user code is not allowed to change CUDA_VISIBLE_DEVICES and the scheduler creates the task process with the needed CUDA_VISIBLE_DEVICES value already set.

Having said that, the rest of my answer assumes you cannot have such scheduler, and don’t mind using somewhat hacky alternatives.

if you just want to occupy some memory, you can create a dummy buffer in the beginning of your code, and delete it just before training starts.
It would allow you to capture some memory while you are still doing your initialization.
Assuming that during training itself you capture more memory, this reduces the chances of someone “joining” your GPU, especially assuming they do any type of checking on the gpus they try to allocate.

Another option is to have some function “def allocate_gpus(gpus_num)” that all users use to allocate GPUs to their user process (and works locally only on a single server level). Inside it there will be a logic that only if a gpu occupied memory is less than, for example, 200 MBs, it may allocated it. If you reach a convention that all people that use the server use the same function, the dummy tensor that I mentioned earlier will work on the vast majority of the cases.

To conclude, the only really scalable solution is using a cluster/server resources scheduler, but if you can’t have that, I hope that the tricks/hacks that I mentioned will help you.

Thanks, @yoelshoshan . I can try the cluster/sever resource scheduler.