Train two or more neural networks parallel

I am developing a deep learning framework where there are multiple neural networks included in my framework design.

Is there any method in PyTorch that I could run multiple neural networks parallel?

Currently, I have my multiple models listed and trained sequentially, but ideally, I wish to make these models to be trained parallelly. For example, I wish to train model A and model B together. Then, the intermediate results will be consumed by model C and model D. Then, the results from model C and model D will be consumed by model F to generate the final results.

Any idea?

Then just pass the outputs form A and B to C.
Cuda calls are asyncronous. If you run A.forward() and then B.forward() that is async. The major problem is both will use the same gpu, thus the speed will be halved. So in short there is no gain at all betwen sequential/parallel if you don’t have aditional resources.

True in your consideration, but my proposed framework has a few advantages in such a parallel design.

So, no one knows how to train 2 or more neural networks at the same time?

I see, anyway already told u. When you call forward the gpu process it asynchronously so everything works in parallel as much as possible :slight_smile:

I haven’t done an asynchronous call forward to train multiple neural networks together; if you may provide some insights, that will be much appreciated.

For example, neural network A and B are to be trained together. Then, neural network C takes the outputs from A and B for next step’s training. How may I do that in coding using PyTorch?

Another question, if I use Jupyter Notebook the code is executed sequentially. If I have A.forward and then B.forward. Jupyter Notebook takes A first and takes B after A’s execution is finished. I am not sure how could I do that in Jupyter Notebook.

I have no idea how’s this sequential/parallel/async processing works.

You can have a large explanation here:
https://pytorch.org/docs/master/notes/cuda.html
The main short is ops aren’t processed in the order you call them but in an optimal one as internally ops are enqueued.
Some ops force synchronization.
That’s what i mean by “they are async”.

WRT using outputs as inputs u can do that. It will construct a graph and backprop properly despite they are not hosted by the same nn.Module. Autograd works at a tensor level.

If you just want to set a daemon process (which means, a thread which runs in the back) I don’t think you can do that in a trainable way. And, once again, even if you could do so, it won’t be faster than “sequentially”. If a proper training pipeline is designed, gpu usage should be ~100%. Thus, splitting in daemon processes ends up taking the same time.

Lastly, you can set 3 nn modules, 3 optimizers and 3 of everything if you need to.