Decoder for reconstruct an image of specific size

Hy guys, i am using an auto-encoder in an atypical way.
Starting from a code of batch size x 64 (output encoder) i have to reconstruct (by the decoder) an image of the size of (BS x 3 x 500 x 500). Here the code of decoder:

class AutoEncoder(nn.Module):
    def __init__(self, encoder):
        super(AutoEncoder, self).__init__()
        self.encoder = encoder

        self.decoder = nn.Sequential(
            nn.Conv2d(1, 64, 3),
            nn.Conv2d(64, 64, 3),
            nn.Upsample(scale_factor=5, mode='bicubic'),
            nn.ReLU(),
            nn.Conv2d(64, 128, 3),
            nn.Conv2d(128, 128, 3),
            nn.Upsample(scale_factor=5, mode='bicubic'),
            nn.ReLU(),
            nn.Conv2d(128, 256, 3),
            nn.Conv2d(256, 256, 3),
            nn.ReLU(),
            nn.Conv2d(256, 512, 3),
            nn.Conv2d(512, 512, 3),
            nn.Upsample(scale_factor=2.5, mode='bicubic'),
            nn.ReLU(),
            nn.Conv2d(512, 1024, 3),
            nn.Conv2d(1024, 1024, 3),
            nn.ReLU(),
            nn.Conv2d(1024, 2048, 3),
            nn.Conv2d(2048, 3, 3)
        )

    def forward(self,x):
        code = self.encoder(x)
        code = code.view(1, 1, 8, 8)
        print("code ", code.shape)
        reconstructed = self.decoder(code)
        print("rec ", reconstructed.shape)
        reconstructed = reconstructed.view(1, 3, 500, 500)

        return code, reconstructed

As you can see in the forward I resize the code in 4 dimensional tensor (bs is 1 for cuda memory) 1x1x8x8.

I wrong some passage for reconstruct an image of 4 dimensional with size of (1x3x500x500)
Can you help me? I think can help me padding ( I tried many values in the first conv2d), but i don’t know how.

1 Like

Can you add padding=1 in all conv2d like this

self.decoder = nn.Sequential(
            nn.Conv2d(1, 64, 3, padding=1),
            nn.Conv2d(64, 64, 3, padding=1),
            nn.Upsample(scale_factor=5, mode='bicubic'),
            nn.ReLU(),
            nn.Conv2d(64, 128, 3, padding=1),
            nn.Conv2d(128, 128, 3, padding=1),
            nn.Upsample(scale_factor=5, mode='bicubic'),
            nn.ReLU(),
            nn.Conv2d(128, 256, 3, padding=1),
            nn.Conv2d(256, 256, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(256, 512, 3, padding=1),
            nn.Conv2d(512, 512, 3, padding=1),
            nn.Upsample(scale_factor=2.5, mode='bicubic'),
            nn.ReLU(),
            nn.Conv2d(512, 1024, 3, padding=1),
            nn.Conv2d(1024, 1024, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(1024, 2048, 3, padding=1),
            nn.Conv2d(2048, 3, 3, padding=1)
        )

Basically, a simple rule of thumb is
output_size(height or width) = input_size - kernel_size + 2 * padding + 1
Of course, the equation is more complicated when you use strided or dilated convolutions.
So, in each conv layer your are reducing the feature map size when you don’t use the padding and hence not achieving the 62.5 times (5 * 5 * 2.5) upsampling you need to go from 8x8 to 500x500.