Dataparallel causing shape conflicts

ilyes · May 19, 2020, 7:09pm

Hello guys,

Let’s say I have the following forward function in a given defined module

class Classifier(nn.Module):

    def __init__(self,args*):
        super(Classifier, self).__init__()
        # defining some layers

    def forward(self, feature_batch, tensor_A, *args):
        # some operations between feature_batch and tensor_A
        return results

This module is a part of my global model. and tensor_A is a learnable parameter passed from another module.

# feature_batch.shape : (32, 128)
# tensor_A.shape : (500, 128) # is supposed to be like this
# feature_batch_expanded.shape : (32, 500, 128)
# tensor_A_expended.shape : (32, 500, 128)# is supposed to be like this

I have 2 GPUs, I am doing dataparallelization before startning the training:

global_model = nn.DataParallel(global_model).to(self.device)

The problem I am having is shapes mismatch during the operation because tensor_A is not what I am waiting for:

# tensor_A.shape : (250, 128)

when I print the shapes inside the forward, this is what i get:

# feature_batch.shape : (32, 128)
# tensor_A.shape : (250, 128)
# feature_batch.shape : (32, 128)
# tensor_A.shape : (250, 128)

the forward pass is called two times before starting the operations.
my questions are:
1 - why feature_batch is not being split on two GPUs? I mean when I print the shape I dont have two times (16, 128). means the feature_batch is passed only to one GPU
2 - How can I stop pytorch from thinking that 500 is the batch size, and so stopping it from splitting the tensor_A. is there an effective solution to this without having gradient problems.

Thank you

ptrblck · May 21, 2020, 7:58am

This code snippet does the expected splitting:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        pass

    def forward(self, feature_batch, tensorA):
        print(feature_batch.shape, feature_batch.device)
        print(tensorA.shape, tensorA.device)
        return feature_batch

model = MyModel().cuda()
model = nn.DataParallel(model, device_ids=[0, 1])
feature_batch = torch.randn(32, 128).cuda()
tensorA = torch.randn(500, 128).cuda()
out = model(feature_batch, tensorA)
> torch.Size([16, 128]) cuda:0
torch.Size([250, 128]) cuda:0
torch.Size([16, 128]) cuda:1
torch.Size([250, 128]) cuda:1

Are you sure you haven’t increased the batch size?

Anyway, if you want to pass the same tensorA to all replica, you could use:

tensorA = tensorA.unsqueeze(0).expand(2, -1, -1)
out = model(feature_batch, tensorA)