Minimum changes to DCGAN to run on 32x32 images (and worrying behavior)

asberman · December 29, 2018, 7:38pm

I’m trying to run the DCGAN on Imagenet 32x32, but am running into problems.

If I just change the --imageSize to 32, then the convolutional layers break and I get the error RuntimeError: sizes must be non-negative . I changed the kernel size of the final Generator layer to 1 and the kernel size of the final Discriminator layer to 2 (as per the suggestion this related issue on the PyTorch github) but then I get a size mismatch error ValueError: Target and input must have the same number of elements. target nelement (64) != input nelement (256) . I haven’t made any other changes to main.py as I want to establish a baseline model.

What other changes to the parameters/Generator/Discriminator do I need?

This is a bit unrelated, but I’m also concerned that a full 25 epoch run of the unmodified DCGAN on LSUN (64x64) produced the following result:

The generator (as per the DCGAN paper) produced passable images in the first few epochs, e.g.

but by the 5th epoch, the generator collapsed completely and could only produce images like this, and never recovered (as per the above loss graphs):

What has gone wrong here? At a loss!

ptrblck · December 30, 2018, 2:26pm

I guess your generator still outputs a fake image in the resolution 64x64, while your discriminator expects an input of 32x32. Could you check it and let me know, if that’s not the issue?

asberman · December 30, 2018, 4:36pm

Yes @ptrblck, it seems that my generator still does output a 64x64 image, even if I set ngf (the number of filters in the generator to 32.
This is the current generator:

nz = 100
ngf = 64 #originally, need to change to 32?
nc = 3
self.main = nn.Sequential(
            # input is Z, going into a convolution
            nn.ConvTranspose2d(     nz, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            # state size. (ngf*8) x 4 x 4
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            # state size. (ngf*4) x 8 x 8
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            # state size. (ngf*2) x 16 x 16
            nn.ConvTranspose2d(ngf * 2,     ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            # state size. (ngf) x 32 x 32
            # filter_size was originally 4, changed to 1 to avoid
            # RuntimeError: sizes must be non-negative
            nn.ConvTranspose2d(    ngf,      nc, 1, 2, 1, bias=False),
            nn.Tanh()
            # state size. (nc) x 64 x 64
        )

Is there a simple/standard change I can make to the above? Or just trial-and-error 'til the tensor is the right shape? Also, any idea as to what caused the weird behavior of the native (unchanged DCGAN) experiment?

Thanks as always for the help!

ptrblck · December 30, 2018, 5:41pm

The number of filters (ngf) won’t change the spatial size. You could remove a “conv transpose block” from your generator. The spatial sizes are explained in the code comments. If you just remove the last but one block, your code should work.

Unfortunately, I’m not sure what causes the mode collapse. Usually I just try different hyperparameters and hope it’s working.

asberman · December 31, 2018, 1:12pm

Thanks again @ptrblck
Sorry, do you mean remove this block?

nn.ConvTranspose2d(ngf * 2,     ngf, 4, 2, 1, bias=False),
nn.BatchNorm2d(ngf),
nn.ReLU(True),

So that the generator looks like this?

self.main = nn.Sequential(
            # input is Z, going into a convolution
            nn.ConvTranspose2d(     nz, ngf * 8, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            # state size. (ngf*8) x 4 x 4
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            # state size. (ngf*4) x 8 x 8
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            # state size. (ngf*2) x 16 x 16
            #nn.ConvTranspose2d(ngf * 2,     ngf, 4, 2, 1, bias=False),
            #nn.BatchNorm2d(ngf),
            #nn.ReLU(True),
            # state size. (ngf) x 32 x 32
            # filter_size was originally 4, changed to 1 to avoid
            # RuntimeError: sizes must be non-negative
            nn.ConvTranspose2d(    ngf,      nc, 1, 2, 1, bias=False),
            nn.Tanh()
            # state size. (nc) x 64 x 64
        )

I’m assuming I’ll also need to change the filter sizes somewhere?

ptrblck · December 31, 2018, 2:06pm

Yes, this should work if you also change the in_channels of your last nn.ConvTranspose2d layer ngf*2.

asberman · December 31, 2018, 2:26pm

Yes, that seems to help. What should filter_size for the last layer be? And do I need to change anything in my discriminator?

class Discriminator(nn.Module):
    def __init__(self, ngpu):
        super(Discriminator, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is (nc) x 64 x 64
            nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf) x 32 x 32
            nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*2) x 16 x 16
            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*4) x 8 x 8
            nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*8) x 4 x 4
            nn.Conv2d(ndf * 8, 1, 2, 1, 0, bias=False),
            nn.Sigmoid()
        )

The suggestion in this related issue on github suggests the kernel_size in the last layer of G be 1, and in the last layer of D 2, but this still results in a RuntimeError: sizes must be non-negative error. What should kernel_size be in this case?

ptrblck · December 31, 2018, 2:42pm

Your code looks alright.
I can’t find any differences to my code.
Could you check if my code works for you or if you’ll get the same error?

asberman · December 31, 2018, 4:46pm

Double checked with your code, made 1 or 2 edits, and looks like it’s working!

Thanks once again, I really appreciate it!