I am building an Autoencoder where I need to encode an image into a latent representation of length 100. I am using the following architecture for my model.
Is there any way I could give my Linear layer’s output to a Conv2D or a ConvTranspose2D layer so that I can reconstruct my image? The output is restored if I remove the Linear layer. I want to know how I can reconstruct my image keeping the Linear layer
Yes that should be possible, if you are able to create a view of the tensor in the expected shape. nn.ConvTranspose2d expects an input in the shape [batch_size, channels, height, width].
Since your linear layer is returning 100 output features, you won’t be able to use in_channels=128, but would have to lower it.
You could use out = out.view(out.size(0), 4, 5, 5) on the output of the linear layer and pass it to the transposed conv.
Thanks @ptrblck.
I tried another way, I just added another linear layer like this self.linear2 = nn.Linear(in_features=100,out_features=128*30*30) and in the forward function, I used x = x.view(x.size(0),128,30,30)) and kept the rest of the code same. When I print the output after every layer in the forward method, I am getting the desired shape. But when I start training, I am getting the following error RuntimeError: shape '[8, 115200]' is invalid for input of size 2113536 despite resizing all images to (128,128) in my dataloader.
Could you please have a look at my code (Google Colab notebook).
Even though you are defining a valid Resize transformation, you are not applying the transformation in your CustomDataset, but instead but use ToTensor().
If your images have different spatial size, this shape mismatch error will be raised.