What is the use of `device_ids` in DDP constructor?

I have a distributed setup with 1 gpu per node & four gpus/nodes in total. I noticed that it doesn’t matter whether I put device_ids and output_device in the DDP constructor, as long as I have the model device set up correctly (e.g. by model.to(rank)).

What’s the use of device_ids and output_device? Does setting them up help with speed or something else? I’m a bit confused and I didn’t find more detailed documents in the DDP API page.

If you’ve set up the model on the appropriate GPU for the rank, device_ids arg can be omitted, as the DDP doc mentions:

Alternatively, device_ids can also be None .

Also, it actually appears output_device is unused by DDP now, possibly after some refactoring. This argument doesn’t have to be used and won’t have any effect, so it is fine to omit it as well.

I’m still a bit confused. Say if I’m using singleGPU per node, then I’ll have to set up the model device correct first, otherwise it’ll complain with errors anyways. In other words, I need to do DDP(model.to(rank), ...) anyways, irrespective of adding device_ids=[rank] or not. Is it fair to say that the device_ids and output_device are just dummy (for now) and have no practical effects?