DCGAN help needed for tuning 48 x 84 output size

Hi, I am training a DCGAN to generate piano rolls of format 48 x 84 (grayscale). So far i am able to generate piano rolls at of 48 x 48. However I just could not get it to work for 48 x 84 and am getting desperate now. I am not looking for an exact solution but just really in need of some advice, for example what do I change if its not working, no of channels? filter size ? stride or it just would not work for this output size? I’m just kinda lost here. The architecture of the discriminator is the opposite of the generator. I am training the discriminator once and generator once for each iteration.

nz = 100
ngf = 64
nc = 1
nn.ConvTranspose2d( nz, ngf * 8, 4, 1, 0, bias=False),
nn.BatchNorm2d(ngf * 8),
nn.ReLU(True),
# torch.Size([25, 1024, 4, 4])
nn.ConvTranspose2d(ngf * 8, ngf * 4, (2,4), 2, 0, bias=False),
nn.BatchNorm2d(ngf * 4),
nn.ReLU(True),
#torch.Size([25, 256, 8, 10])
nn.ConvTranspose2d( ngf * 4, ngf * 2, (2,4), 2, 0, bias=False),
nn.BatchNorm2d(ngf * 2),
nn.ReLU(True),
#torch.Size([25, 256, 16, 22])
nn.ConvTranspose2d( ngf * 2, ngf, (3,4), (3,2), 2, bias=False),
nn.BatchNorm2d(ngf),
nn.ReLU(True),
#torch.Size([25, 256, 44, 42])
nn.ConvTranspose2d( ngf, nc, (7,4), (1,2), 1, bias=False),
#torch.Size([25, 128, 48, 84])
nn.Sigmoid()

I am not sure if here is the right place for asking, if its not I will take this down.

It looks like you already tried to double the width of your conv filters. Did it improve anything?
If not, it’s just a wild idea and could fail totally, but have you thought about cutting the image and use the second part as an additional channel?
E.g. your current input size [1, 48, 84] would become [2, 48, 42].

You may just need some padding at the right place. What I usually do in these situations is the goode olde print() debugging, printing the sizes of the tensors after each convolution to better track what’s going on.

You can calculate the “same” padding for convolution via

(w - k + 2*p)/s + 1 = o
=> p = (s(o-1) - w + k)/2

where p is the padding amount, w is the input width, k is the kernel size, and o is the desired output size. In your case you need to do that for both height and width separately.

Sorry for the late reply, nope its just a wild idea. For now, I am generating [1, 48, 64]. I would try to tweak up to [1, 48, 84]. Your idea seems nice, however what I want is a [1, 48, 84] output from the generator not a [1, 48, 84] input. Or are you referring to the discriminator?

Thanks for the advice, but I got the sizes correctly. The problem was the training of DCGAN not converging. The loss kept going up gradually for the generator with that network specification. Are there any rule of thumb for filters size of the generator? For example 3 x 3 or 5 x 5. I could not use equal sizes filters as what I want for the output is 48 x 84 neglecting the batch size and number of channels of the Tensor.

Thanks for the advice, but I got the sizes correctly. The problem was the training of DCGAN not converging.

Sorry, I misunderstood. Somehow, I was thinking the problem was related to changing it that it can “technically” handle the change in width.

Sorry for the late reply, nope its just a wild idea. For now, I am generating [1, 48, 64] . I would try to tweak up to [1, 48, 84]

I would also maybe try to keep the height & widths in the hidden layers proportional to the inputs (if you are not doing it already). E.g., something like 14x24x64, or 4x8x128, etc… You could do this by changing the stride for the width.

Also, do I understand correctly that you train on 48 x 84 images but provide 48 x 64 to the discriminator?

1 Like

Sorry for the late reply, just finished my semester exams. I really appreciate your help and the prompt reply.

I trained on 48x64 images. I discarded pixels above 64. I have managed to get it working now, I discarded the previous idea an generate 96 x 96 images instead, having all the hidden layers in proportional to the inputs.