Inconsistant input size error while using DataParallelism

I was trying to implement multiGpu using pytorch. However, I got the following error:

RuntimeError: input has inconsistent input_size: got 256, expected 512

The code works perfectly ok using single GPU but got this error when trying to use multiple GPUs. Any suggestion would be really appreciated.

This is a snippet of the code:

os.environ[‘CUDA_VISIBLE_DEVICES’] = ‘1,3’

netG = Net_G().cuda()
netD = Net_D().cuda()

Loading model form the checkpint

epoch = 23


Multiple GPUs

if torch.cuda.is_available():
    netG = nn.DataParallel(netG, device_ids=[0,1])
    netD = nn.DataParallel(netD, device_ids=[0,1])

h0 = torch.zeros(Param.batch_size, Param.unet_channel * 8 * 2).cuda()
c0 = torch.zeros(Param.batch_size, Param.unet_channel * 8 * 2).cuda()

opt_G = optim.Adam(netG.parameters(), lr=Param.G_learning_rate, betas = (0.5,0.999), weight_decay=Param.weight_decay)
opt_D = optim.Adam(netD.parameters(), lr=Param.D_learning_rate, betas = (0.5,0.999), weight_decay=Param.weight_decay)

trainset = COCOData('COCOTrain.csv', RandCrop())

#### Dataloader 
train_loader =, batch_size=Param.batch_size, shuffle=True, num_workers=2, drop_last=True)

Create a Generator of images of training dataset

train_data = inf_get(train_loader)
real_data =

        #### Corrupt the data in four phases 
        real_data = real_data.cuda()
        real_data_64 = destroy(real_data, 64)
        real_data_48 = destroy(real_data, 48)
        real_data_32 = destroy(real_data, 32)
        real_data_16 = destroy(real_data, 16)

        #### Wrap the corrupted the data in four phases 
        real_data_64 = Variable(real_data_64)
        real_data_48 = Variable(real_data_48)
        real_data_32 = Variable(real_data_32)
        real_data_16 = Variable(real_data_16)
        real_data_0 = Variable(real_data)

        ##### Initialize the gradient in zero gradient

        p_real_48, p_real_32, p_real_16, p_real_0 = netD(real_data_48.cuda(), real_data_32.cuda(), real_data_16.cuda(), real_data_0.cuda())
        target = Variable(ones_31)

        real_loss_48 = bce_loss(p_real_48.cuda(), target.cuda())
        real_loss_32 = bce_loss(p_real_32.cuda(), target.cuda())
        real_loss_16 = bce_loss(p_real_16.cuda(), target.cuda())
        real_loss_0 = bce_loss(p_real_0.cuda(), target.cuda())

        Error here: RuntimeError: input has inconsistent input_size: got 256, expected 512
        **_fake_data_48, fake_data_32, fake_data_16, fake_data_0 = netG(real_data_64, real_data_48, real_data_32, real_data_16, Variable(h0), Variable(c0))_**

        p_fake_48, p_fake_32, p_fake_16, p_fake_0 = netD(Variable(, Variable(, Variable(, Variable(

(Not an answer, but anyway) Does the batch size change between singe GPU and multiple GPU cases? If yes, could the NN’s forward method be somehow hardcoded for a certain batch size?

Yes Batch size is changed from single GPU to multiple GPUs. However, I have checked with the same batch size but got the same error.