Hi,
My question is not about training. I am writing a demo code with two models I have already trained.
To speed up the code, I want to load the two different models on two different GPUs, and send different inputs to each model at the same time for concatenating the result as output.
I have tried to use torch.multiprocessing, but I got RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method.
Then I added mp.set_start_method('spawn'), but I still got RuntimeError: context has already been set
My code structure is like:
def function(queue, img, net, gpu_id):
y = net(Variable(torch.from_numpy(img).permute(2,0,1).unsqueeze(0)).cuda(gpu_id))
queue.put(y.cpu().data.numpy()[0])
if __name__ == '__main__':
net1 = net1.cuda(0)
net2 = net2.cuda(1)
queue = mp.Queue()
process1 = mp.Process(target=function,args=(queue,img1,net1,0))
process2 = mp.Process(target=function,args=(queue,img2,net2,1))
process1.start()
process2.start()
process1.join()
process2.join()
output1 = queue.get()
output2 = queue.get()
output = np.concatenate((output1, output2), 0)
How can I implement multiprocessing on multi GPUs correctly?
Thanks a lot!!
