Return of Shape mismatch by Generator

Hello Everyone,

class Generator(nn.Module):
    Generator Class
        input_dim: the dimension of the input vector, a scalar
        im_chan: the number of channels of the output image, a scalar
              (MNIST is black-and-white, so 1 channel is your default)
        hidden_dim: the inner dimension, a scalar
    def __init__(self, input_dim=10, im_chan=3, hidden_dim=64):
        super(Generator, self).__init__()
        self.input_dim = input_dim
        # Build the neural network
        self.gen = nn.Sequential(
            self.make_gen_block(input_dim, hidden_dim * 4),
            self.make_gen_block(hidden_dim * 4, hidden_dim * 2, kernel_size=4, stride=1),
            self.make_gen_block(hidden_dim * 2, hidden_dim),
            self.make_gen_block(hidden_dim, im_chan, kernel_size=4, final_layer=True),

    def make_gen_block(self, input_channels, output_channels, kernel_size=3, stride=2, final_layer=False):
        Function to return a sequence of operations corresponding to a generator block of DCGAN;
        a transposed convolution, a batchnorm (except in the final layer), and an activation.
            input_channels: how many channels the input feature representation has
            output_channels: how many channels the output feature representation should have
            kernel_size: the size of each convolutional filter, equivalent to (kernel_size, kernel_size)
            stride: the stride of the convolution
            final_layer: a boolean, true if it is the final layer and false otherwise 
                      (affects activation and batchnorm)
        if not final_layer:
            return nn.Sequential(
                nn.ConvTranspose2d(input_channels, output_channels, kernel_size, stride),
            return nn.Sequential(
                nn.ConvTranspose2d(input_channels, output_channels, kernel_size, stride),

    def forward(self, noise):
        Function for completing a forward pass of the generator: Given a noise tensor, 
        returns generated images.
            noise: a noise tensor with dimensions (n_samples, input_dim)
        x = noise.view(len(noise), self.input_dim, 1, 1)
        print(x.shape,"check the shape of last layer")
        return self.gen(x)

Here is the generator model. This model was previously for Mnist. For learning purposes I am changing it for CIFAR 10. When I feed the input I am getting wrong size of image shapes.

fake = gen(noise_and_labels)

return the shape of torch.Size([128, 3, 28, 28])
I need the output as [128, 3, 32, 32]

Can anyone guide me where I need to change in model?


Hi, the issue is since the input size of CIFAR dataset is (32 x 32) while that of MNIST is (28 x 28). You need to change the stride or kernel size or padding (you can change any one of them).
Output shape of a conv = (W - F + 2P)/S + 1. So in your make_gen_block function change the kernel_size or stride. Or add padding parameter.