ConvTranspose2d output size


I am new to PyTorch and I am at the moment building my first ever GAN network.
While choosing the proper layer architecture I have noticed some behavior that I cannot understand and I am really interested in the inner-working of this function.

Let’s say, that I have got a batch of data looking like this:

input_data = torch.randn(64, 100, 1, 1)

As I understand, it can be interpreted as a set of 64 pictures, of height=100 pixels, width=1 pixel in a grayscale.
I want to use this random data to generate some images, so I insert it into a network in which the first layer is a ConvTranspose2d. This layer requires the size of the input data to be specified and the size of the output data as well. It looks strange for me, because I thought, that the size of an output image is determined by the size of an input image, stride, and padding. However, the layer canfit the output into the given size.

Example 1:

conv = ConvTranspose2d(100, 512, 4, 1, 0)
output = conv(input_data)
torch.Size([64, 512, 4, 4])

Example 2:

conv = ConvTranspose2d(100, 3157, 4, 1, 0)
output = conv(input_data)
torch.Size([64, 3157, 4, 4])

Probably, I do not understand some of the Convolutional Transpose layer’s mechanics but I find it very interesting and I wonder whether someone knows a simple answer to the question:
How does the ConvTranspose2d fit the output into an arbitrary number of output channels?

I will be grateful for your help :wink:

No, PyTorch uses the NCHW memory format (channels-first), so your input would have 64 channels and a single pixel.

That’s not the case and you would need to define e.g. the in_channels, out_channels, and kernel_size.

Each output channels is created by a filter kernel, so you can define this number arbitrarily.

For a general overview of the conv / transposed conv arithmetic, have a look at this tutorial.

Thank You very much, this explains a lot. Especially the linked tutorial comes in handy.
Now I understand the problem I had with channels.
Thank You one more time.