Confusion with regards to Generator ConvTranspose2d and input/output size

MasayoMusic · July 9, 2019, 8:29pm

I am going through the tutorial below, but I am confused as to how the right image shape is created via the generator.

https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html

It seems the latent vector created is a one dimensional vector of size 100:

We are feeding that into ConvTranspose2D as getting back a 512, 8x8?

ConvTranspose2d(100, 512, kernel_size=(4, 4), stride=(1, 1), bias=False)

How exactly is this happening?

In the diagram shown within tutorial, 100z is being projected into a 1024, 4, 4 vector then converted to
512,8,8 via ConvTranspore.

Are we skipping this step? I am assuming the projection would take place using a fc layer and then reshape?

So what exactly is happening here:

ConvTranspose2d(100, 512, kernel_size=(4, 4), stride=(1, 1), bias=False)

ptrblck · July 9, 2019, 9:32pm

By feeding the noise tensor of [batch_size, nz, 1, 1] into the first transposed conv layer, the spatial size will be increased.
Let’s have a look at a simple use case using just a single pixel value:

b = 1
nz = 1
noise = torch.ones(b, nz, 1, 1)
conv = nn.ConvTranspose2d( nz, nz, 4, 1, 0, bias=False)

output = conv(noise)
print(output.shape)
> torch.Size([1, 1, 4, 4])
print(conv.weight)
> Parameter containing:
tensor([[[[-0.0689, -0.0482,  0.1806,  0.0298],
          [-0.1211,  0.1254, -0.1988, -0.1285],
          [ 0.0184,  0.1757,  0.1835, -0.0602],
          [ 0.0267, -0.0453, -0.0595,  0.0140]]]], requires_grad=True)
print(output)
> tensor([[[[-0.0689, -0.0482,  0.1806,  0.0298],
            [-0.1211,  0.1254, -0.1988, -0.1285],
            [ 0.0184,  0.1757,  0.1835, -0.0602],
            [ 0.0267, -0.0453, -0.0595,  0.0140]]]],
         grad_fn=<ConvTranspose2DBackward>)

As you can see in this example the kernel will just be multiplied with the single input value.
Since I’ve initialized it as a 1, you’ll get the same output value.

Note that if you are using more than a single input channel, the output will be the sum over the input channels of the weight kernels in this example.

MasayoMusic · July 10, 2019, 3:07am

Thank you so much for this example. I forgot I could just do tests on my own especially since Pytorch graphs are dynamic. I will play around with strides as well.