Hi everyone!
I have doubts mainly about how the strides in the ConvTranspose2D layers work.
My model of an Conv AE is :
def __init__(self):
super(ConvAutoencoder, self).__init__()
# Encoder
self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(in_channels=16, out_channels=4, kernel_size=3, padding=1)
self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
# Decoder
self.t_conv1 = nn.ConvTranspose2d(in_channels = 4, out_channels = 16,kernel_size= 2, stride=2)
self.t_conv2 = nn.ConvTranspose2d(in_channels = 16, out_channels =1, kernel_size=2, stride=2)
def forward(self, x):
x = F.relu(self.conv1(x))
x = self.pool(x)
x = F.relu(self.conv2(x))
x = self.pool(x)
x = F.relu(self.t_conv1(x))
x = self.t_conv2(x)
return x
If the input has 1 channel and size 4x8, I understand that after the first pool, the size is 2x4 and after the second one is 1x2. I know that after the two transpose layers, it gets size 4x8 again, but I don’t understand how the kernel size and the strides in the ConvTranspose2d layers work to perform this.
Moreover, for example if the input was of size 4x6, I don’t know how to return it to that size with the transpose layers.