Deconvolution for not square images

smu226 · April 18, 2020, 10:15pm

Hello! I am trying to build a network that reconstructs some images from a 1D vector. However the images are not squared (they are 15x27). What is the best way to do that? ConvTranspose2d seems to be able to return just squared images. Thank you!

ptrblck · April 19, 2020, 12:01am

This should not be the case:

conv = nn.ConvTranspose2d(1, 3, 4, 2)
x = torch.randn(1, 1, 15, 27)
out = conv(x)
print(out.shape)
> torch.Size([1, 3, 32, 56])

You could also define the parameters of your transposed convolution for each spatial dimension separately, if needed.

smu226 · April 19, 2020, 12:32am

I am sorry if I explained that the wrong way. I start with a 1D vector and I want to end up with a 15x27 image. I don’t start with that image. So my input is (32,5), (the 32 is just the batch size) and the output should be (32, 2, 15, 27) so a batch of images with 2 channels and size 15x27.

ptrblck · April 19, 2020, 1:07am

There are of course multiple ways of achieving the desired output shape and this would be one:

x = torch.randn(32, 5)
x = x.unsqueeze(2).unsqueeze(2)

model = nn.Sequential(
    nn.ConvTranspose2d(5, 2, (2, 2), 2),
    nn.ConvTranspose2d(2, 2, (2, 4), 2),
    nn.ConvTranspose2d(2, 2, (2, 3), 2),
    nn.ConvTranspose2d(2, 2, (1, 3), 2),
)

out = model(x)
print(out.shape)
> torch.Size([32, 2, 15, 27])

smu226 · April 26, 2020, 12:10am

Thanks a lot for this! I tried it and I do get the right shape and the NN is training, however the images look a bit weird (I attached an example of input/output after the NN is trained). Using a similar architecture (which is basically an autoencoder), I get almost perfect reconstruction for square images. Also the error seems to be systematic i.e. if I train the NN with different initializations I get the same strides in the output, for all the images. Do you know what could cause this (I haven’t used non-square images before). Could it be because of the way the deconvolution is done (it could just as well be a bug in my code, but it is worth asking here)? Thank you!

ptrblck · April 26, 2020, 1:58am

Thanks for getting back!
I guess the pattern is created by the last nn.ConvTranspose2d layer, which uses a kernel height of 1 and a stride of 2 in both spatial dimensions.
This will skip every second row. Sorry for missing this.

You could of course change the overall architecture and try to use overlapping or neighboring kernels.
This is a quick and dirty fix using the current architecture and increasing the last kernel dimension:

model = nn.Sequential(
    nn.ConvTranspose2d(5, 2, (2, 2), 2),
    nn.ConvTranspose2d(2, 2, (2, 4), 2),
    nn.ConvTranspose2d(2, 2, (2, 3), 2),
    nn.ConvTranspose2d(2, 2, (2, 3), stride=2, padding=(1, 0), output_padding=(1, 0)),
)

Let me know, if that works better.

smu226 · April 26, 2020, 5:09am

Thanks a lot! It works perfectly now!

Angry_potato · April 26, 2020, 6:24am

Hi,
I would like to know how you are training your network.Are you using GAN type of approach?

smu226 · April 26, 2020, 6:55am

I am doing a VAE approach actually.

danielgonzalezcortes · January 11, 2021, 5:02pm

hi @ptrblck how do you calculate the structure to reach that output?, for example, how can I modify your example to reach an output of torch.Size([32, 1, 101, 27]) or
torch.Size([32, 1, 101, 35])

What is the math/equations behind this?

Thank you very much!

ptrblck · January 19, 2021, 9:44am

The formula to calculate the output shape is given in the docs of nn.ConvTranspose2d.
You could calculate the approx. architecture manually (e.g. start with layers, which would double the spatial size) and “fine-tune” the layers later in case the final output size doesn’t match.
Note that you can add print statements into the forward method of your model to get the output shapes of intermediate tensors, which would make creating the model a bit easier.

danielgonzalezcortes · January 21, 2021, 10:04am

Thank you very much @ptrblck , helps me a lot.