Hi all,

I have a question about the torch.multiprocessing.

My network is something like this:

for the input tensor X, I need to calculate f_i(x) , i = 1…10, each f_i is a sub-network (mutually independent), and the final output is simply the sum of f_i(x)

However, performing the for loop of

for i in range(10):

f_i(x)

is slow, and leads to a low GPU-utility.

It seems that torch.multiprocessing is a feasible way to solve this issue but I find limited materials. So, could someone kindly provide me a simple demo code? Thank you very much.

btw, I just have one GPU, so what I am doing is to parallel many small networks in one GPU.