Strange shape mismatch issue with torch.BCEWithLogitsLoss()

4158ndfkvBHJ1 · November 24, 2020, 4:02pm

I am writing a relativistic GAN. I have the following generator update function

def optimize_generator(self, real_batch, fake_batch):

        """
        relativistic generator update step
        """

        valid = torch.ones(real_batch.size(0), device = self.device)
        fake = torch.zeros(fake_batch.size(0), device = self.device)


        y_pred = self.netD(real_batch)
        y_pred_fake = self.netD(fake_batch.detach())
        
        print("(y_pred- torch.mean(y_pred_fake)).squeeze().size() : ", (y_pred- torch.mean(y_pred_fake)).squeeze().size())
        print("(y_pred_fake - torch.mean(y_pred)).squeeze().size() : ", (y_pred_fake - torch.mean(y_pred)).squeeze().size())
        print("fake.size() : ", fake.size())
        print("valid.size() : ", valid.size())

        print((y_pred- torch.mean(y_pred_fake)).squeeze().size() == fake.size())
        print((y_pred_fake - torch.mean(y_pred)).squeeze().size() == valid.size())

        real_v_fake = self.loss((y_pred- torch.mean(y_pred_fake)).squeeze(), fake)
        fake_v_real = self.loss((y_pred_fake - torch.mean(y_pred)).squeeze(), valid)
        
        g_loss = (real_v_fake + fake_v_real)/2

        """
        gradient update pass
        """

        self.gen_optim.zero_grad()
        g_loss.backward()
        self.gen_optim.step()

        return g_loss

I have a very strange issue with the dimensions of the input and the target, linked in the following console output :

as you can see, the size of the input and target is the same torch.size([64]) and a boolean check of equality even succeeds, but BCEWithLogitsLoss keeps throwing this error. I had a look at the torch/nn/functional.py code, and I seem to be failing this exact check (line 2433) :

if not (target.size() == input.size()):
    raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))

I am at a bit of a loss, because I don’t see how that’s different from the boolean equality I tested before the call to BCEWithLogitsLoss. Any help would be much appreciated !

4158ndfkvBHJ1 · November 25, 2020, 10:43am

The equality even fails after a reshape

    valid = torch.ones(real_batch.size(0), device = self.device)
    fake = torch.zeros(fake_batch.size(0), device = self.device)


    y_pred = self.netD(real_batch)
    y_pred_fake = self.netD(fake_batch.detach())
    
    real_v_fake = (y_pred- torch.mean(y_pred_fake))
    fake_v_real = (y_pred_fake - torch.mean(y_pred))

    real_v_fake = torch.reshape(real_v_fake,(real_batch.size(0),1))
    fake_v_real = torch.reshape(fake_v_real,(real_batch.size(0),1))
    valid = torch.reshape(valid, (real_batch.size(0),1))
    fake = torch.reshape(fake, (real_batch.size(0),1))
    
    real_v_fake = self.loss(real_v_fake, fake)
    fake_v_real = self.loss(fake_v_real, valid)

meaning torch.nn.BCEWithLogitsLoss still complains about different shapes even after pytorch successfully reshapes them to the exact same shape (real_batch.size(0),1).

ptrblck · November 27, 2020, 6:59am

Could you post an executable code snippet to reproduce this error, please?

andre_dourson · December 1, 2020, 9:57am

So I also have a strange issue with BCEWithLogitsLoss , simply with computation of the loss thru

loss = criterion(outputs.squeeze(), labels)

The computation works just fine 99.9% of the time but when I use this on my dataset with 7987 rows AND with a batch size of 256, I have the error

File “/nics/b/home/doursand/anaconda3/envs/limsnet/lib/python3.7/site-packages/torch/nn/functional.py”, line 2580, in binary_cross_entropy_with_logits
raise ValueError(“Target size ({}) must be the same as input size ({})”.format(target.size(), input.size()))
ValueError: Target size (torch.Size([1])) must be the same as input size (torch.Size([]))

However, if I change the batch size value to 255 instead of 256, then everything works like a charm!

Now if you do the calculation,

7937 rows / 256 = 31.0039062

and

7937 rows / 255 = 31.133333

Question : is it a rounding issue which makes the iteration with 256 images/batch fail (as 31.0039062 is seen as 31 ?)

4158ndfkvBHJ1 · December 1, 2020, 4:12pm

I’m an idiot, the error is not happening in optimize_generator, but in optimize_discriminator, which I called earlier. The equality in shapes succeeded … just not in the function outputting the error.

4158ndfkvBHJ1 · December 1, 2020, 4:13pm

this is probably related to the last batch being incomplete. You can change the batch size as you did, or you can use drop_last = True as an option in the DataLoader to avoid this issue.