Data parallelism on pytorch

jmaronas · March 8, 2018, 10:10am

Hi everyone.

I am having some trouble with data parallelism on a system with two gpus. After reading the tutorial it seems it is easy but I cannot get it running.

The point is I have two models, say model1 and model2 and I want to train both on both gpus. The point is, as i have understood, that pytorch get a minibatch let say 100 samples, and split it equally into both gpus and the average the weights. What i do is:

model1=torch.nn.DataParallel(model1).cuda()
model2=torch.nn.DataParallel(model2).cuda()

Then tipe
for i in trainLoader: o=model1(i) output=model2(o)

I get the error:

raise runtime error(‘all tensors must be on devices[0]’)

What is the good way of doing this?. For the moment what I do is put model1 in gpu 0 and model2 in gpu 1 and transfer the output from model1 to gpu1.

I do not want to merge the models.

Thanks in advance.

ptrblck · March 8, 2018, 10:24am

I don’t think DataParallel will give you any performance advantages, if your models work in a sequential way (input → model1 → model2 → output).

That’s what I would do. Do you get the error message using this approach?
If so, try:

model1 = model1.cuda(0)
model2 = model2.cuda(1)

for data in trainloader:
    data = data.cuda(0)
    data = Variable(data)
    output = model1(data)
    output = output.cuda(1) # transfer it to GPU1
    output = model2(output)

jmaronas · March 8, 2018, 11:22am

basically I understand that dataparalellism takes a batch and do forward in both gpus at the same time over the two models. That is my intention.

The other approach works fine, however I do not know how much performance I get.

tjoseph · March 8, 2018, 11:24am

Is there any reason you can’t just wrap them both into one nn.DataParallel? I think that’s best practise!

class Trainer(torch.nn.Module):
    def __init__(self):
        self.model1 = Model1()
        self.model2 = Model2()

        super(Trainer, self).__init__()

    def forward(self, x):
        x = self.model1(x)
        x = self.model2(x)

        return x

trainer = torch.nn.DataParallel(Trainer()).cuda()

jmaronas · March 8, 2018, 11:37am

I will try and report results. For the moment it is better to have them in same gpu rather than transfer the output from one model to the output of the other.

ptrblck · March 8, 2018, 12:10pm

@jmaronas I don’t think you can do a forward pass on both models at the same time, since the input of model2 depends on the output of model1.

@tjoseph If both models fit on the one GPU, you could do it and it seems to be a good approach.
I assumed both GPUs are more or less fully occupied with one model each.
Probably I was wrong.

jmaronas · March 8, 2018, 12:13pm

The point is that what it is parallelize is the data so if we have a copy of model1 and model2 in same gpu we can do a forward no matter if they share or not and output. So half of the batch goes in one gpu and half in other. That is how I have understood it works.

ptrblck · March 8, 2018, 12:14pm

Ah ok. Then @tjoseph’s approach seems to be the way to go.
Sorry for any confusion.

jmaronas · March 8, 2018, 12:20pm

Actually, I am trying to get @tjoseph’s to work. However It seems in the tutorial from pytorch that data should be loaded to gpu like:

data.cuda()

However I get this error:

raise output
RuntimeError: arguments are located on different GPUs at /pytorch/torch/lib/THC/generated/…/generic/THCTensorMathPointwise.cu:349

ptrblck · March 8, 2018, 12:45pm

Here is a small working example:

class SubNet(nn.Module):
    def __init__(self, in_features, out_features):
        super(SubNet, self).__init__()
        self.linear1 = nn.Linear(in_features, out_features)
        
    def forward(self, x):
        x = self.linear1(x)
        return x

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.model1 = SubNet(10, 20)
        self.model2 = SubNet(20, 1)
        
    def forward(self, x):
        print("input size: {}".format(x.shape))
        x = self.model1(x)
        x = self.model2(x)
        return x
    

x = Variable(torch.randn(6, 10).cuda())

model = Net()
models = nn.DataParallel(model, device_ids=[0, 1])
models.cuda()
output = models(x)

jmaronas · March 8, 2018, 12:58pm

Yes I did something similar. The point is that one of my model first copies parameters of another pretrained model so I pass the already constructed model to the init of the trained model. I will investigate and report.