Run Pytorch on Multiple GPUs

ptrblck · February 9, 2019, 2:53pm

torch.device('cuda') will use the default CUDA device. It should be the same as cuda:0 in the default setup. However, if you are using a context manager as described in this example (e.g. with torch.cuda.device(1):), 'cuda' will refer to the specified device.
In the default context, they will be the same. However, I think input.cuda() will also behave like the default device as in point 1. I would recommend to stick to the .to() operator, as the code is quite easy to be written in a device-agnostic way.
I’m unfortunately not familiar with torchtext, but based on the doc, your suggestion makes sense. Let’s wait for other answers on this point.
Yes, that’s right. You’ll see an unbalanced GPU usages as beautifully explained by @Thomas_Wolf in his blog post.
Regarding nn.DistributedDataParallel I try to stick to the NVIDIA apex examples. I’m currently not sure, if there is still a difference between the apex and PyTorch implementation of DistributedDataParallel or if they are on par now. Maybe @mcarilli or @ngimel might have an answer for this point.
I’m not sure and would guess not. However, I’ve seen some paper explaining the momentum might be adapted for large batch sizes. Take this info with a grain of salt and let’s hear other opinions.