Conceptual doubt with ConvTranspose2D and Stride

Hi everyone!

I have doubts mainly about how the strides in the ConvTranspose2D layers work.

My model of an Conv AE is :

    def __init__(self):
        super(ConvAutoencoder, self).__init__()

        # Encoder
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=4, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)

        # Decoder
        self.t_conv1 = nn.ConvTranspose2d(in_channels = 4, out_channels = 16,kernel_size= 2, stride=2)
        self.t_conv2 = nn.ConvTranspose2d(in_channels = 16, out_channels =1, kernel_size=2, stride=2)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        x = F.relu(self.conv2(x))
        x = self.pool(x)
        x = F.relu(self.t_conv1(x))
        x = self.t_conv2(x)
        return x

If the input has 1 channel and size 4x8, I understand that after the first pool, the size is 2x4 and after the second one is 1x2. I know that after the two transpose layers, it gets size 4x8 again, but I don’t understand how the kernel size and the strides in the ConvTranspose2d layers work to perform this.

Moreover, for example if the input was of size 4x6, I don’t know how to return it to that size with the transpose layers.

A guide to convolution arithmetic for deep learning is a great reference to as it visualizes how different convolutions are applied. The relevant repository also provides some animations.