DCGAN help needed for tuning 48 x 84 output size


(Ph) #1

Hi, I am training a DCGAN to generate piano rolls of format 48 x 84 (grayscale). So far i am able to generate piano rolls at of 48 x 48. However I just could not get it to work for 48 x 84 and am getting desperate now. I am not looking for an exact solution but just really in need of some advice, for example what do I change if its not working, no of channels? filter size ? stride or it just would not work for this output size? I’m just kinda lost here. The architecture of the discriminator is the opposite of the generator. I am training the discriminator once and generator once for each iteration.

nz = 100
ngf = 64
nc = 1
nn.ConvTranspose2d( nz, ngf * 8, 4, 1, 0, bias=False),
nn.BatchNorm2d(ngf * 8),
nn.ReLU(True),
# torch.Size([25, 1024, 4, 4])
nn.ConvTranspose2d(ngf * 8, ngf * 4, (2,4), 2, 0, bias=False),
nn.BatchNorm2d(ngf * 4),
nn.ReLU(True),
#torch.Size([25, 256, 8, 10])
nn.ConvTranspose2d( ngf * 4, ngf * 2, (2,4), 2, 0, bias=False),
nn.BatchNorm2d(ngf * 2),
nn.ReLU(True),
#torch.Size([25, 256, 16, 22])
nn.ConvTranspose2d( ngf * 2, ngf, (3,4), (3,2), 2, bias=False),
nn.BatchNorm2d(ngf),
nn.ReLU(True),
#torch.Size([25, 256, 44, 42])
nn.ConvTranspose2d( ngf, nc, (7,4), (1,2), 1, bias=False),
#torch.Size([25, 128, 48, 84])
nn.Sigmoid()

I am not sure if here is the right place for asking, if its not I will take this down.


#2

It looks like you already tried to double the width of your conv filters. Did it improve anything?
If not, it’s just a wild idea and could fail totally, but have you thought about cutting the image and use the second part as an additional channel?
E.g. your current input size [1, 48, 84] would become [2, 48, 42].


(Sebastian Raschka) #3

You may just need some padding at the right place. What I usually do in these situations is the goode olde print() debugging, printing the sizes of the tensors after each convolution to better track what’s going on.

You can calculate the “same” padding for convolution via

(w - k + 2*p)/s + 1 = o
=> p = (s(o-1) - w + k)/2

where p is the padding amount, w is the input width, k is the kernel size, and o is the desired output size. In your case you need to do that for both height and width separately.