[Multiple Process on same GPU device]


I have access to one server that I can use one GPU per time with 40 GB memory. As each of my experiments only take 7gb±, sometimes I run multiple experiments with different hparams. But I have been noticing that when I do that, the experiments run too slowly. Even the CPU and GPU have a lot of free resources.

My guess is some strange resource that is not fully isolated when I run multiple processes on the same GPU. Is there a way to fully isolate the process in the same GPU to take advantage of the free memory that I have?

ps: I know that maybe the SLURM can do something like that, emulating the GPU with scheduling for different users, but as I am not the root, I would like to know if there is maybe a simple solution or another possible solution.

Best Regards,

GPUs for the most part don’t have great resource isolation, for newer GPUs you might want to take a look at MIG support

1 Like

Oh, this MIG support seems to be what I was looking for. Thanks! If someone else also knows another tool, it would be great.