How to make my faster GPU execute two batches while my other GPU does a batch?

Hello Everyone, hope you all are doing well. I had the following doubt:

I have a system in which I have two GPUs, say GPU0-> RTX Quadro 8000 and GPU1-> RTX A6000.
Now I believe everyone here is aware that RTX A6000 is faster than Quadro 8000. With some benchmarking I found out that they if benchmark tests are run on each one of them alone, RTX A6000 is 2x faster than Quadro 8000 for ResNet50 on ImageNet dataset

A6000, 32min, 196 batchsize
Quadro8000, 65min, 196 batchsize

I want to run both of them together but I don’t want RTX A6000 to be under used because Quadro 8000 is slower relatively. Can I make any changes to my batch sampler?

  • Can I make the batch sampler handle more the data for A6000, which is, that Faster GPU processing 2 batches, while the slower GPU process just 1 batch in the meantime.

  • In this case how should I accumulate the 2 batches of gradients from the Faster GPU, and 1 batch of gradients from slower GPU. How do I accumulate and calculate the updates, essentially after 3 batches of processing per step.

  • When and how do I need to call optimizer.step(), grad_zero() to achieve the above behavior.

  • I can’t just feed double the data size to the faster GPU, because it’s maxing out memory. That is, when I feed batchsize ‘X’ badata to faster gpu, because a sample size

I am new to all these topics and would really appreciate any help from this community. Thank You for your time and patience.