I have two servers with 3 GPUs each. I can run my code when I use all GPUs on servers (6 GPUs). I want to make benchmark by using 2 GPUs on each (4 GPUs) and 1 GPUs on each server (2 GPUs).
ngpus_per_node = 1 # or can be 2 or 3 args.world_size = ngpus_per_node * args.world_size # 2 (for 2 machine) is sent to for world_size
when I use all GPUs on each machine it works fine, but by less than it the code stuck in following line without any error:
model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[gpu])