How can I train 2 submodules in parallel with 2 GPUs - one per GPU?

Hi, I have loaded my 2 submodules in 2 GPUs - one per GPU as below

submodule1.cuda(0)
submodule2.cuda(1)

and in the forward pass

x1’ = submodule1(x1)
x2’ = submodule2(x2)

In the above case, GPU1 is idle while GPU0 is processing. Is it possible to run the above 2 in parallel and not one after the other?

@nathan you may need to invoke two threads each for managing one GPU, because although the CUDA calls are async by default, you might introduce some python-level sync point in your module.
Could you post up a sample code of your submodule?

Thanks @Stone. I created python multiprocessing Pool to execute them parallely.

But the problem was time-taken(execute them using Pool) > time-taken(execute them sequentially)!

Probably due to the bookeeping involved with Pool. Any thoughts?

Hey nathan, I was thinking of doing the same, I have 2 models I wanted to optimize in parallel using 2 GPUs however I see you said it takes longer, was this a bug or is it really the case? If so I will not bother with it, unless it was easy to implement and try?