I’ve been recently reading about GANs and I went through the usual tutorials (MNIST and I also tried generating pokemons). For the pokemon net, I followed the DCGAN architecture, and I made sure the images were of size 3x64x64.
Now, I want to try another dataset. In this case, images are 3x190x190 (is this too much ?) and I was tweaking the padding and the stride of the Generator model in order to meet this 190x190 size when I thought that maybe this process is not as naive as this. Is there any protocol or some kind of rule for adjusting padding and stride and kernel size and still conserving relevant informations ?
I apologize if this sounds stupid, I’m not yet used to GANs and CNNs
Thanks a lot !
It is easiest to work with images with spatial dimensions in base-2. So to build on richard’s advice, just scale your pokemons at import with
torchvision.transforms.Resize(256) or something and then you can use the layer args from DCGAN like this:
x = Variable(torch.randn(1, 100, 1, 1))
layer = nn.ConvTranspose2d(100, 100, 4, stride=2, padding=1, bias=False)
out = layer(x) # [1, 100, 1, 1] -> [1, 100, 2, 2]
# out = layer(out)
# uncommenting above will make out.shape() == [1, 100, 4, 4]
Where repeating layers with these args will double the spatial dimensions of your images at each step.
You can also, of course, target your specific size. But I don’t know of a formula that will predict output sizes for
ConvTranspose. CS231n has one for
Conv layers (about 1/3 of the way down) so that might provide some insight.
I finally figured it out. Thanks for the 2^n tips. It was useful !
Hi gentlemen. If you want to control the size of the output of any convolution operation, you can follow the arithmetic here http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html.