Hi all,
I have a question about the torch.multiprocessing.
My network is something like this:
for the input tensor X, I need to calculate f_i(x) , i = 1…10, each f_i is a sub-network (mutually independent), and the final output is simply the sum of f_i(x)
However, performing the for loop of
for i in range(10):
f_i(x)
is slow, and leads to a low GPU-utility.
It seems that torch.multiprocessing is a feasible way to solve this issue but I find limited materials. So, could someone kindly provide me a simple demo code? Thank you very much.
btw, I just have one GPU, so what I am doing is to parallel many small networks in one GPU.