Example for DistributedDataParallel?

I usually like the PyTorch Documentation but I think for DistributedDataParallel it is terrible. I already cancelled several approaches to make it work and at the moment I am really depending on it. So some help would be amazing!

In all the tutorials I have seen so far, they never showed a way to specify the GPU’s I want to use. I am working on a server with 10 GPU’s, but they are not mine and I usually can only use 4 when I am lucky. So I somehow need to specify which exact GPU’s I would to use - this is covered nowhere in the tutorials.

In the official dokumentation [DistributedDataParallel — PyTorch master documentation] it says as follows:

torch.cuda.set_device(i)
torch.distributed.init_process_group(
backend='nccl', world_size=N, init_method='...')
model = DistributedDataParallel(model, device_ids=[i], output_device=i)

It says further:

In order to spawn up multiple processes per node, you can use either torch.distributed.launch or torch.multiprocessing.spawn .

But the documentation over there is even more complicated… Is there no easier way to do it? Can someone please help me out here or refer to a useful example? Thanks a lot!!