Why downscale 64x64 images by a factor of 100?

collinarnett · May 23, 2020, 7:54pm

I’m reading this paper on a DCGAN and the authors write:

For upsampling by the generator, we use strided convolution transpose operations instead of pixel
shuffling [43] or interpolation, as we found this to work better in practice. We set the slope of the
LeakyReLU units to 0.2 and the dropout rate to 0.1 during training. Importantly, we did not normalize
input maps but scaled them down by a constant factor.

They also mention it in the model architecture:

------------------64 GAN------------------
down-scale factor = 100

--Generator--
nz = 100
ConvTranspose2d( 512, 4, 1, 0)
BatchNorm2d(512)
LeakyReLU(0.2),
ConvTranspose2d(256, 4, 2, 1) 
BatchNorm2d(256)
LeakyReLU(0.2)
ConvTranspose2d(128, 4, 2, 1) 
BatchNorm2d(128)
LeakyReLU(0.2)
ConvTranspose2d(64, 4, 2, 1) 
BatchNorm2d(64)
LeakyReLU(0.2)
ConvTranspose2d(1, 4, 2, 1) 
Clamp(>0)
Enforce Symmetry

Wouldn’t this mean that the image goes from 64x64 to 6x6? How can the generator/discriminator ever produce images that resemble this if they’re pulling from such a small image?

ptrblck · May 24, 2020, 4:07am

I don’t think “downscaling” is referring to the spatial size, but to a multiplicative factor.

Here normalization is mentioned:

which seems to point towards using a constant factor for the downscaling via:

act = act * 1/factor

collinarnett · May 24, 2020, 11:51pm

Ok that makes more sense thank you!