Question about ConvTranspose


I’ve been recently reading about GANs and I went through the usual tutorials (MNIST and I also tried generating pokemons). For the pokemon net, I followed the DCGAN architecture, and I made sure the images were of size 3x64x64.

Now, I want to try another dataset. In this case, images are 3x190x190 (is this too much ?) and I was tweaking the padding and the stride of the Generator model in order to meet this 190x190 size when I thought that maybe this process is not as naive as this. Is there any protocol or some kind of rule for adjusting padding and stride and kernel size and still conserving relevant informations ?

I apologize if this sounds stupid, I’m not yet used to GANs and CNNs

Thanks a lot !

Couldn’t you downsample your images to 3x64x64, using something like ?

1 Like

It is easiest to work with images with spatial dimensions in base-2. So to build on richard’s advice, just scale your pokemons at import with torchvision.transforms.Resize(256) or something and then you can use the layer args from DCGAN like this:

x = Variable(torch.randn(1, 100, 1, 1))

layer = nn.ConvTranspose2d(100, 100, 4, stride=2, padding=1, bias=False)

out = layer(x) # [1, 100, 1, 1] -> [1, 100, 2, 2]

# out = layer(out)
# uncommenting above will make out.shape() == [1, 100, 4, 4]

Where repeating layers with these args will double the spatial dimensions of your images at each step.

You can also, of course, target your specific size. But I don’t know of a formula that will predict output sizes for ConvTranspose. CS231n has one for Conv layers (about 1/3 of the way down) so that might provide some insight.

Good luck!

1 Like

I finally figured it out. Thanks for the 2^n tips. It was useful !

Hi gentlemen. If you want to control the size of the output of any convolution operation, you can follow the arithmetic here

1 Like

great resource, thanks!