In the ImageNet example linked here https://github.com/pytorch/examples/blob/master/imagenet/main.py, when we call the main_worker through mp.spawn, how is the main_worker getting the GPU argument? When I try to run this with 2 nodes that have 2 GPUs each, I always see this parameter to be None. It works for multiple nodes with single GPUs.
multiprocessing.spawn will feed the process id as the first argument to the target function. Here is the API doc: https://pytorch.org/docs/stable/multiprocessing.html#torch.multiprocessing.spawn