Data_parallel with multiple inputs

bobchennan · April 27, 2017, 5:07pm

Hi all,

I started to use PyTorch yesterday and it works pretty well.

Today I tried to use data_parallel but there are some errors.

I tried to reproduce the error with this simple code:

import torch
import torch.utils.data
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable
import sys
import numpy as np

class Test(nn.Module):
    def __init__(self):
        super(Test, self).__init__()
        self.const = Variable(torch.from_numpy(np.zeros((5,5),dtype=np.float32))).cuda(0)

    def forward(self, x, y):
        bat = x.size(0)
        return self.const.unsqueeze(0).expand(bat, 5, 5)+x+y

model=Test()
model = torch.nn.DataParallel(model, device_ids=range(int(sys.argv[1])))
inp1  = Variable(torch.from_numpy(np.zeros((6,5,5),dtype=np.float32))).cuda()
inp2  = Variable(torch.from_numpy(np.zeros((6,5,5),dtype=np.float32))).cuda()
print inp1
print model(inp1, inp2)

The error msg is:
RuntimeError: arguments are located on different GPUs at /b/wheel/pytorch-src/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:214

bobchennan · April 28, 2017, 4:36pm

Any clues for this problem?

colesbury · April 28, 2017, 5:27pm

self.const is on GPU 0 but x and y can be on other GPUs.

bobchennan · April 30, 2017, 3:41pm

Should I specify the GPU id in this case? I tried set both to 0 or left both blank. Neither of them works.

bobchennan · May 2, 2017, 1:32am

I found the solution. I need to specify the self.const as parameters.

import torch
import torch.utils.data
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable
import sys
import numpy as np

class Test(nn.Module):
    def __init__(self):
        super(Test, self).__init__()
        self.const = nn.Parameter(torch.from_numpy(np.zeros((5,5),dtype=np.float32)), requires_grad=False)

    def forward(self, x, y):
        bat = x.size(0)
        return self.const.unsqueeze(0).expand(bat, 5, 5)+x+y

model=Test()
model = torch.nn.DataParallel(model, device_ids=range(int(sys.argv[1])))
model = model.cuda()
inp1  = Variable(torch.from_numpy(np.ones((6,5,5),dtype=np.float32))).cuda()
inp2  = Variable(torch.from_numpy(np.ones((6,5,5),dtype=np.float32))).cuda()
print inp1
print model(inp1, inp2)

bobchennan · May 2, 2017, 1:47am

When you want to optimize the network you need to specify:

optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=1e-3)