Getting correct output dimensions from an autoencoder taking variable input dimensions

alee023 · August 18, 2021, 11:47pm

I am attempting to write a variational autoencoder, using PyTorch, that takes in 3D images of varying dimensions. My code right now works for even input dimensions which will face no issues when going through the pooling layer. For odd dimensions, however, the pooling layer will round down (for example, 15->7). Consequentially, one of my decoder layers that doubles the dimensions will give the incorrect output dimension (7->14, rather than 15).

One of the workarounds I’ve come across is resizing the inputs to a compatible, even dimension before putting them through the network. Are there any other fixes to this issue – such as by using padding, etc)? I really appreciate any help.

ptrblck · August 19, 2021, 5:35am

Padding the inputs with shape checks would certainly be another valid approach.
In case you are using transposed conv layers for the upsampling, you could also use the output_size argument (in its forward call) to specify the desired output shape in case it’s ambiguous.