It’s the first time I’m trying my hands at Convolutional Autoencoder network model. In more detail, I would like to replicate the model proposed in the paper “Deconvolutional Paragraph Representation Learning”. While this is about text, my current issues are with the Conv/Deconv setup in general. I principle, I have the model ready and it seems to be training. However, I’ve noticed some kind of stumbling block, I’m not sure how to handle best.
To be as close as possible to the paper, all the
Conv layers use
stride=2 as a replacement for pooling/unpooling to improve training time (I’m not arguing of this as a valid alternative here; just sticking to the paper). However, with
stride=2 and depending on the kernel sizes and the length of the input sentences
ConvTranspose layers – initialized matching the parameters as the
Conv layers – might not yield the matching output sizes
L_out. Given the formula for
Conv includes as
floor it’s easy to see why this happens.
Currently I handle this by setting
output_padding in the
ConvTranspose layers to 0 or 1 to make up for the “mistakes”. While this works, I have to manually make this adjustment every time I change the kernel sizes or
seq_len. I guess I could calculate the values for
Is there any more straightforward method to somehow ensure that the output size match up or is this simply always up to me to manually get all the numbers right?
If my problem is not quite clear, here’s a minimal example:
import torch import torch.nn as nn conv = nn.Conv1d(in_channels=1, out_channels=5, kernel_size=5, stride=2) deconv = nn.ConvTranspose1d(in_channels=5, out_channels=1, kernel_size=5, stride=2) inputs = [[[1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0]]] inputs = torch.tensor(inputs, dtype=torch.float32) encoded = conv(inputs) decoded = deconv(encoded) print("encoded.shape =", encoded.shape) print("decoded.shape =", decoded.shape) print("target.shape =", inputs.shape) => encoded.shape = torch.Size([1, 5, 13]) => decoded.shape = torch.Size([1, 1, 29]) => target.shape = torch.Size([1, 1, 30])
deconf is 29 and does not match with the target size of 30. In case I want to keep
kernel_size=5 as in the paper, I have two alternatives to fix that
ConvTranspose1d, the output and target size are both 30
- increase the inputs size by 1, so both output and target size are 31