[Multiple Process on same GPU device]

heitorrapela · April 10, 2023, 1:21am

Hello,

I have access to one server that I can use one GPU per time with 40 GB memory. As each of my experiments only take 7gb±, sometimes I run multiple experiments with different hparams. But I have been noticing that when I do that, the experiments run too slowly. Even the CPU and GPU have a lot of free resources.

My guess is some strange resource that is not fully isolated when I run multiple processes on the same GPU. Is there a way to fully isolate the process in the same GPU to take advantage of the free memory that I have?

ps: I know that maybe the SLURM can do something like that, emulating the GPU with scheduling for different users, but as I am not the root, I would like to know if there is maybe a simple solution or another possible solution.

Best Regards,

marksaroufim · April 10, 2023, 2:22am

GPUs for the most part don’t have great resource isolation, for newer GPUs you might want to take a look at MIG support

heitorrapela · April 10, 2023, 3:24pm

Oh, this MIG support seems to be what I was looking for. Thanks! If someone else also knows another tool, it would be great.