Hogwild on MultiGPU


I need to use multiple GPUs available in the machine in a way that each of the processes uses exactly one GPU. I modified the mnist_hogwild code https://github.com/pytorch/examples/blob/master/mnist_hogwild/main.py as the following:

dataloader_kwargs = {'pin_memory': True} if use_cuda else {}
    dcount = torch.cuda.device_count()
    devices = []
    model = Net()
    for i in range(dcount):

    # model = Net().to(device)
    for i in range(dcount):
    model.share_memory() # gradients are allocated lazily, so they are not shared here

    processes = []
    for rank in range(args.num_processes):
        p = mp.Process(target=train, args=(rank, args, model, devices[int(rank%dcount)], dataloader_kwargs))
        # We first train the model across `num_processes` processes
    for p in processes:

However, while running this code with num_processes = 2, as there are two GPUs in my machine, I can see only one of them engaged. Can you please suggest what exactly I need in the code here?

Please review my version.

I’m happy to fix issues, improve readability.

This really is derived from an RL implementation by @dgriff available here

This snippet will first move the model to device 0 and then to device 1. If you don’t explicitly move the model in the functions you’re running through multiprocessing, then you’ll have to make this dependent on the rank of the target process. As is, I assume you’re only using process 1.

Hi, would you mind explaining why the shared optimizer is necessary please?