Linear layer output dim -> target and input size mismatch when calculating loss

For context, I’m trying to implement the Creative Adversarial Network (a GAN) in Pytorch. This GAN’s discriminator sends 2 signals, the normal image is real or image is fake, as well as how easily it can classify the image. Thus, there would be 2 outputs in the discriminator’s forward() method.

Unfortunately, l’m now running into input size mismatches when trying to calculate multilabel_soft_margin_loss.

I get the following error : ValueError: Target and input must have the same number of elements. target nelement (64) != input nelement (1024)

This is my model’s structure

     num_disc_filters = 64
     self.conv = nn.Sequential(
            nn.Conv2d(channels, num_disc_filters, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(num_disc_filters, num_disc_filters * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(num_disc_filters * 2),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(num_disc_filters * 2, num_disc_filters * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(num_disc_filters * 4),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(num_disc_filters * 4, num_disc_filters * 16, 4, 2, 1, bias=False),
            nn.BatchNorm2d(num_disc_filters * 16),
            nn.LeakyReLU(0.2, inplace=True),
        self.final_conv = nn.Conv2d(num_disc_filters * 16, 1, 4, 1, 0, bias=False)
        self.sig = nn.Sigmoid()
        self.real_or_fake = nn.Linear(num_disc_filters*16,1)
        self.fc = nn.Sequential() 
        self.fc.add_module('relu.{0}'.format(num_disc_filters*16), nn.LeakyReLU(0.2, inplace=True))
        self.fc.add_module('relu.{0}'.format(num_disc_filters), nn.LeakyReLU(0.2, inplace=True))
        self.fc.add_module('relu.{0}'.format(num_disc_filters), nn.LeakyReLU(0.2, inplace=True))
    def forward(self, inp):

        x = self.conv.forward(inp)
        real = self.final_conv(x)
        x = x.view(-1,x.size(1)) 
        print(x.size(),"xs") # torch.Size([1024, 1024])

        real_out = self.sig.forward(real) 
        real_out = real_out.view(-1,1).squeeze(1)
        style = self.fc.forward(x)
        print(style.size(),"style size")  # torch.Size([1024, 1]) style size
        return real_out,style

style is then assigned to output_styles, and used in the following line:
err_disc_style = criterion_style(output_styles, style_labels)
where criterion_style is nn.MultiLabelSoftMarginLoss() and style_labels is a 64x1 Tensor of class labels (ints).

Why is style’s size 1024x1 instead of 64x1?
Surely self.fc should ensure this with the line self.fc.add_module("linear_layer.{0}".format(num_disc_filters),nn.Linear(num_disc_filters,1)) ?

Any help would be greatly appreciated - thanks in advance!


The first dimension is alwasy the batch size. So x has a batch size of 1024 and 1024 features for each sample.
When going through self.fc, it changes the feature size and the last linear makes it go to 1. And so the output has the same batch size: 1024 and feature size of 1. Hence the output size of 1024x1.

Oh okay - thanks very much! What changes would you recommend to self.fc / forward(self.inp)
such that I would get style’s size as 64x1?

real_out and the assoc. real-vs-fake ‘normal’ GAN loss work fine, so I don’t think the problem lies with self.conv or self.final_conv etc. Just need to get this additional classification loss working on top of the usual GAN loss.


How many samples are there in your batch initially? 1024 or 64?
If it’s 1024 then your target size if wrong and you should find out why you only have 64 targets.
If it’s 64, then you need to find out why x is of size 1024 while it should only be 64.

99.999% sure it’s 64, as I set batch_size=64 in my DataLoader. The GAN I’m trying to implement has a discriminator with 1024 output filters/channels , so I’m pretty sure self.final_conv = nn.Conv2d(num_disc_filters * 16, 1, 4, 1, 0, bias=False) , but yet that works out to 64x1 in the real_out .

I guess the problem is that my linear layers (supposed to be built on top of the convolutional ones) are wrongly configured? Anything you can see that could be the problem?

Also, what do you think is the simplest way forward?

Many thanks once again!

What is this line x = x.view(-1,x.size(1)) suppose to do? Don’t you want x = x.view(x.size(0), -1) here to collapse the 2D features into a 1D feature so that it can be fed to the linear layers?

Yeah that’s what that line is supposed to do - iirc I based it off of what someone on Stack Overflow with a similar task had done. But thanks, will try your suggestion!