Understanding `output_size` in ConvTranspose2d

I’m using ConvTranspose2d in an autoencoder architecture to upsample. Thus, I’m trying to understand the following code snippet (adapted from the docs):

import torch
import torch.nn as nn
import torch.nn.functional as F

input = torch.randn(1, 16, 13, 12)
conv = nn.Conv2d(16, 16, 3, stride=1, padding=1)
downsample = nn.MaxPool2d(2)
upsample = nn.ConvTranspose2d(16, 16, 3, stride=2, padding=1)
h = conv(input)
print('after conv', h.size())
h = downsample(h)
print('after pooling', h.size())

output = upsample(h, output_size=input.size())

Here I want to size of the output to be the same as the input. This snippet is throwing an error however: “requested an output size of torch.Size([13, 12]), but valid sizes range from [11, 11] to [12, 12] (for an input of torch.Size([6, 6]))”.

It works fine if the input size is even, e.g. input = torch.randn(1, 16, 12, 12). Clearly this happens because the downsampling floors the output size from 13/2 = 6.5 to 6, which makes sense. What confuses me is that by setting the output_size argument, ConvTranspose2D cannot handle this seemingly simple case. Am I using it wrong or is it indeed expected that client code works around this? I’m currently fixing it like this:

# ...
output = upsample(h)
print('before padding')
output = F.pad(output, (0, 1, 1, 1))
print('after padding', output.size())

I could also fix it by adjusting the padding in the first place, i.e.:

upsample = nn.ConvTranspose2d(16, 16, 3, stride=2, padding=(0, 1))
# ...
output = upsample(h, output_size=input.size())
print('before padding', output.size())

This also works, but I want the padding to be dynamic, i.e. I don’t know at creation time of upsample whether the input size is going to be odd or even. Furthermore, I am not sure if the two variants are semantically identical. The first one adds padding after the convolution (i.e. my output contains a bunch of zeros at the “edge” rows/columns), whereas the second one pads before the convolution (but I’m not sure if setting the output_size parameter also triggers padding after the convolution).

I.e. my question boils down to: How can I ensure I’m getting exactly the size I want - dynamically at runtime - after an upsamling transpose convolution?