Strided convolution dimensionality problem: ValueError: Target and input must have the same number of elements

Hi, world!

First of all, thank you for caring for pytorch newbie as to be reading this question right now.

In the matter of the question itself, I am implementing a DCGAN Discriminator, here is the code:

class Discriminator(nn.Module):
    def __init__(self, filter_sizes, leaky_relu_alpha):
        super(Discriminator, self).__init__()

        # Network architecture
        self.conv_1 = nn.Conv2d(in_channels=img_channels, out_channels=filter_sizes[0], kernel_size=4, stride=2, padding=1)
        
        self.conv_2 = nn.Conv2d(in_channels=filter_sizes[0], out_channels=filter_sizes[1], kernel_size=4, stride=2, padding=1)
        self.conv_2_bn = nn.BatchNorm2d(filter_sizes[1])
        
        self.conv_3 = nn.Conv2d(in_channels=filter_sizes[1], out_channels=filter_sizes[2], kernel_size=4, stride=2, padding=1)
        self.conv_3_bn = nn.BatchNorm2d(filter_sizes[2])
        
        self.conv_4 = nn.Conv2d(in_channels=filter_sizes[2], out_channels=filter_sizes[3], kernel_size=4, stride=2, padding=1)
        self.conv_4_bn = nn.BatchNorm2d(filter_sizes[3])
        
        self.dense = nn.Linear(in_features=filter_sizes[3] * (img_size//16) * (img_size//16), out_features=1)

        # Hyperparameters
        self.filter_sizes = filter_sizes

        self.leaky_relu_alpha = leaky_relu_alpha

    def forward(self, x):
        # Conv 1 | out:[16 x 16 x 128]
        x = self.conv_1(x)
        x = F.leaky_relu(x, self.leaky_relu_alpha)

        # Conv 2 | out:[8 x 8 x 256]
        x = self.conv_2(x)
        x = self.conv_2_bn(x)
        x = F.leaky_relu(x, self.leaky_relu_alpha)

        # Conv 3  | out:[4 x 4 x 512]
        x = self.conv_3(x)
        x = self.conv_3_bn(x)
        x = F.leaky_relu(x, self.leaky_relu_alpha)

        # Conv 4 | out:[2 x 2 x 1024]
        x = self.conv_4(x)
        x = self.conv_4_bn(x)
        x = F.leaky_relu(x, self.leaky_relu_alpha)

        # Classification layer
        x = x.view(-1, self.filter_sizes[3] * (img_size//16) * (img_size//16))

        x = self.dense(x)
        x = F.sigmoid(x)

        return x

When I try to execute it, I get this error when I try to evaluate the output of the model with the given Binary Cross Entropy criterion (BCELoss):

ValueError: Target and input must have the same number of elements. target nelement (128) != input nelement (288)

So, to my understanding the problem here is that I am not handling dimensionality correctly, and as the strided convolution changes the width and height tensor dimensions to unexpected values, the flattening layer is wrapping the convolution output values into more samples than the inputted batch_size.

Now, I have checked the documentation on Conv2d layers, and, following the given output sizes formula, I think I should be properly following the reduction on width and height dimensions.

Also, when I check the input dimensions, it is a tensor with its 0 dimension properly being 128; the error happens at the very first batch and iteration of the training, so this doesn’t seem to be the problem described in this post.

Any direction in this matter would be highly appreciated.
Have a nice start of the week, my good folk!

What is the input shape? i.e. channels and im_size? I get no error using following

def main():

	m = Discriminator([3, 3, 3, 3], 0.2) #input_channel = 3 and img_size=256

	x = torch.ones((50, 3, 256, 256))
	y = torch.ones((50, 1))
	
	p = m(x)

	obj = nn.BCELoss()

	loss = obj(p, y)

Hi, @donJuan!

Damn, sorry, you are right, that’s indeed very valuable information.

I am using a batch size of 128 rgb images of 32 x 32, which ends up in an input tensor of [128, 3, 32, 32].

Okay, damn, my bad.

The problem was, stupid me, that the images generated by the Generator didn’t have the proper size, as I hadn’t corrected the padding values yet.

I’m really sorry! And thank you kindly, @donJuan! You put me in the right direction. : )