I have doubts mainly about how the strides in the ConvTranspose2D layers work.
My model of an Conv AE is :
def __init__(self): super(ConvAutoencoder, self).__init__() # Encoder self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, padding=1) self.conv2 = nn.Conv2d(in_channels=16, out_channels=4, kernel_size=3, padding=1) self.pool = nn.MaxPool2d(kernel_size=2, stride=2) # Decoder self.t_conv1 = nn.ConvTranspose2d(in_channels = 4, out_channels = 16,kernel_size= 2, stride=2) self.t_conv2 = nn.ConvTranspose2d(in_channels = 16, out_channels =1, kernel_size=2, stride=2) def forward(self, x): x = F.relu(self.conv1(x)) x = self.pool(x) x = F.relu(self.conv2(x)) x = self.pool(x) x = F.relu(self.t_conv1(x)) x = self.t_conv2(x) return x
If the input has 1 channel and size 4x8, I understand that after the first pool, the size is 2x4 and after the second one is 1x2. I know that after the two transpose layers, it gets size 4x8 again, but I don’t understand how the kernel size and the strides in the ConvTranspose2d layers work to perform this.
Moreover, for example if the input was of size 4x6, I don’t know how to return it to that size with the transpose layers.