Multi GPU on single GPU, Is there a way to utilize the remaining GPU RAM to run multi-GPU on a single algorithm

If algorithm A uses 15GB out of 40GB, can I allocate the remaining 25GB to run multi-GPU, extracting 15GB from this 25GB pool?
I am running Latent Diffusion; however, my data only works for a batch size smaller than 8, and I’m only using around 11GB out of 40GB. This makes me feel quite wasteful because I’m using Colab Pro+, and the cost is calculated per hour rather than per GB of RAM per hour.

I’m not familiar enough with Colab and don’t know if and how you can launch multiple processes on the same instance and GPU, but generally it’s possible to run multiple processes on the same GPU.

My intention is not to use the remaining amount of RAM to run multi-processes but rather to utilize this remaining portion to further accelerate the running algorithm.
For example: !python main.py --base ./configs/i2i.yaml -t --gpus 0, (using only 12GB out of 40GB).
My intention is to allocate the remaining 28GB, so I can modify the initial command to:
!python main.py --base ./configs/i2i.yaml -t --gpus 0, 1, (24GB/40GB) I’m not sure if there’s a way to do this because, as I understand, the more GPUs, the faster the training speed.

Actually, I’m not proficient in the field of multi-threading. If there’s any misunderstanding about this issue, please point it out to help me. I appreciate it. :sweat_smile:

Increase the batch size or use a larger model.