How to specify multiple GPU usage

The documentation for nn.DataParallel is not clear at all to me. How exactly am I supposed to pass input and targets to the GPU, if I specify concrete GPU’s?

My code looks something like this:

device = torch.device('cuda:' + str(arg.gpu) if torch.cuda.is_available() else 'cpu')
model = Model(arg).to(device)
for epoch in range(epochs+1):
    for step, (original, keypoints) in enumerate(train_loader):
        original, keypoints =,

I want to be able to pass pass GPU’s to the arg_parser through --gpu 5 7, which produces a list [5, 7]. Simply adding the line model = nn.DataParallel(Model(arg), device_ids=[5, 7]) is not enough, since I have to specify the device variable. However, device = torch.device('cuda:5,7' if torch.cuda.is_available() else 'cpu') results in an error. What am I supposed to do instead?

Here’s how your code should look:

model = torch.nn.DataParallel(Model(args), device_ids=arg.gpu)
for epoch in range(epochs+1):
    for step, (original, keypoints) in enumerate(train_loader):
        outputs = model(original, keypoints)
        <do some training>

Why does this work? For nn.DataParallel, the input tensors can be on any device, even cpu. The module automatically transfers the called arguments to the appropriate cuda device.

What was the issue with your code? torch.device is used to define one and only one device. It can only accept the format ‘cuda:x’/‘cuda’/‘cpu’.

Thanks for your quick reply @Mazhar_Shaikh! One question though: During my forward pass I have to create several tensors (for example because I produce a grid for transformations). At the moment, I have to send them to the device as well, so that multiplications and additions with the input tensors work. How will this be effected with your code?

If you are creating tensors inside your forward pass, Please use the below code

class Model(nn.Module):
    def __init__(self, args):
        <insert initialization code>
    def forward(self, original, keypoints):
        some_tensor = torch.tensor([....], device=original.device)
        <do some thing with some_tensor>

Using your code produces the following error for me:

RuntimeError: module must have its parameters and buffers on device cuda:5 (device_ids[0]) but found one of them on device: cpu

The error occurs in the line outputs = model(original, keypoints).