Distributed ImageNet Example Multi GPU Question

ayushm-agrawal · April 17, 2020, 9:38pm

Hello,
In the ImageNet example linked here https://github.com/pytorch/examples/blob/master/imagenet/main.py, when we call the main_worker through mp.spawn, how is the main_worker getting the GPU argument? When I try to run this with 2 nodes that have 2 GPUs each, I always see this parameter to be None. It works for multiple nodes with single GPUs.

mrshenli · April 18, 2020, 12:18am

multiprocessing.spawn will feed the process id as the first argument to the target function. Here is the API doc: https://pytorch.org/docs/stable/multiprocessing.html#torch.multiprocessing.spawn