How can we provide the output of a Linear layer to a Conv2D

I am building an Autoencoder where I need to encode an image into a latent representation of length 100. I am using the following architecture for my model.

        self.conv1 = nn.Conv2d(in_channels = 3, out_channels = 32, kernel_size=3)
        self.conv2 = nn.Conv2d(in_channels=32,out_channels=64,kernel_size=3,stride=2)
        self.conv3 = nn.Conv2d(in_channels=64,out_channels=128,kernel_size=3,stride=2)
        
        self.linear = nn.Linear(in_features=128*30*30,out_features=100)

        self.conv1_transpose = nn.ConvTranspose2d(in_channels=128,out_channels=64,kernel_size=3,stride=2,output_padding=1)
        self.conv2_transpose = nn.ConvTranspose2d(in_channels=64,out_channels=32,kernel_size=3,stride=2,output_padding=1)
        self.conv3_transpose = nn.ConvTranspose2d(in_channels=32,out_channels=3,kernel_size=3,stride=1)  

Is there any way I could give my Linear layer’s output to a Conv2D or a ConvTranspose2D layer so that I can reconstruct my image? The output is restored if I remove the Linear layer. I want to know how I can reconstruct my image keeping the Linear layer

Any help would be appreciated. Thanks!

Yes that should be possible, if you are able to create a view of the tensor in the expected shape.
nn.ConvTranspose2d expects an input in the shape [batch_size, channels, height, width].
Since your linear layer is returning 100 output features, you won’t be able to use in_channels=128, but would have to lower it.
You could use out = out.view(out.size(0), 4, 5, 5) on the output of the linear layer and pass it to the transposed conv.

Thanks @ptrblck.
I tried another way, I just added another linear layer like this
self.linear2 = nn.Linear(in_features=100,out_features=128*30*30) and in the forward function, I used
x = x.view(x.size(0),128,30,30)) and kept the rest of the code same. When I print the output after every layer in the forward method, I am getting the desired shape. But when I start training, I am getting the following error
RuntimeError: shape '[8, 115200]' is invalid for input of size 2113536 despite resizing all images to (128,128) in my dataloader.

Could you please have a look at my code (Google Colab notebook).

Just for reference, here is the sequence of shapes when I pass my model through my network

torch.Size([1, 3, 128, 128])
torch.Size([1, 32, 126, 126])
torch.Size([1, 64, 62, 62])
torch.Size([1, 128, 30, 30])
torch.Size([1, 115200])
torch.Size([1, 100])
torch.Size([1, 115200])
torch.Size([1, 128, 30, 30])
torch.Size([1, 64, 62, 62])
torch.Size([1, 32, 126, 126])
torch.Size([1, 3, 128, 128])

Even though you are defining a valid Resize transformation, you are not applying the transformation in your CustomDataset, but instead but use ToTensor().
If your images have different spatial size, this shape mismatch error will be raised.

Oh my god. I can’t believe I missed that! Thanks a lot. It works perfectly now

1 Like