Train multiple independent models on a single GPU


I have what seems to be a very basic issue but couldn’t find a solution for it.

I have a GPU of 24GB, and I am trying to train two different independent models; each takes 2GB only. When I train the two models simultaneously on that GPU, the training speed on the two models reduces to around (1/5) the speed of training one of them.

What could be the issue?

You might be creating new bottlenecks in your code e.g. if both applications are trying to load data while your storage isn’t fast enough to feed them.
You could profile the script the see where the bottleneck is created.

Also, to parallelize CUDA workloads on the GPU you would need to make sure enough compute resources are free. E.g. if one script uses all SMs in e.g. a matmul operation, the other kernel won’t be able to run, so you might not expect to see perfect parallelization (even without other bottlenecks) but it of course depends on the actual use case.

Maybe you are not using Nvidia gpu, in which case this may be true.

What storage you recommend?

I would use an SSD at least and avoid trying to load data from a spinning disk.