dataloader_kwargs = {'pin_memory': True} if use_cuda else {}
dcount = torch.cuda.device_count()
devices = []
model = Net()
for i in range(dcount):
devices.append(torch.device("cuda:"+str(i)))
torch.manual_seed(args.seed)
mp.set_start_method('spawn')
# model = Net().to(device)
for i in range(dcount):
model.to(devices[i])
model.share_memory() # gradients are allocated lazily, so they are not shared here
processes = []
for rank in range(args.num_processes):
p = mp.Process(target=train, args=(rank, args, model, devices[int(rank%dcount)], dataloader_kwargs))
# We first train the model across `num_processes` processes
p.start()
processes.append(p)
for p in processes:
p.join()
However, while running this code with num_processes = 2, as there are two GPUs in my machine, I can see only one of them engaged. Can you please suggest what exactly I need in the code here?
This snippet will first move the model to device 0 and then to device 1. If you don’t explicitly move the model in the functions you’re running through multiprocessing, then you’ll have to make this dependent on the rank of the target process. As is, I assume you’re only using process 1.