I have a distributed setup with 1 gpu per node & four gpus/nodes in total. I noticed that it doesn’t matter whether I put
output_device in the DDP constructor, as long as I have the model device set up correctly (e.g. by
What’s the use of
output_device? Does setting them up help with speed or something else? I’m a bit confused and I didn’t find more detailed documents in the DDP API page.
If you’ve set up the model on the appropriate GPU for the rank,
device_ids arg can be omitted, as the DDP doc mentions:
device_ids can also be
Also, it actually appears
output_device is unused by DDP now, possibly after some refactoring. This argument doesn’t have to be used and won’t have any effect, so it is fine to omit it as well.
I’m still a bit confused. Say if I’m using singleGPU per node, then I’ll have to set up the model device correct first, otherwise it’ll complain with errors anyways. In other words, I need to do
DDP(model.to(rank), ...) anyways, irrespective of adding
device_ids=[rank] or not. Is it fair to say that the
output_device are just dummy (for now) and have no practical effects?