My question is not about training. I am writing a demo code with two models I have already trained.
To speed up the code, I want to load the two different models on two different GPUs, and send different inputs to each model at the same time for concatenating the result as output.
I have tried to use
torch.multiprocessing, but I got
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method.
Then I added
mp.set_start_method('spawn'), but I still got
RuntimeError: context has already been set
My code structure is like:
def function(queue, img, net, gpu_id): y = net(Variable(torch.from_numpy(img).permute(2,0,1).unsqueeze(0)).cuda(gpu_id)) queue.put(y.cpu().data.numpy()) if __name__ == '__main__': net1 = net1.cuda(0) net2 = net2.cuda(1) queue = mp.Queue() process1 = mp.Process(target=function,args=(queue,img1,net1,0)) process2 = mp.Process(target=function,args=(queue,img2,net2,1)) process1.start() process2.start() process1.join() process2.join() output1 = queue.get() output2 = queue.get() output = np.concatenate((output1, output2), 0)
How can I implement multiprocessing on multi GPUs correctly?
Thanks a lot!!