Ensure that input size equals output one whatever the said input size


I am currently implementing the paper: Multi-Scale Context Aggregation by Dilated Convolutions.

When looking to the PascalVOC 2012 semantic segmentation dataset, the input images are not of the sizes. While, in the paper, it is suggested to use reflection padding in order to have a unique input size, I am wondering this:

I know that, in case of a fully convolutional architecture, the output size is not fixed but depends on the input one. These sizes are not necessarily the same but we can work on the architecture parameters (filters dimensions, padding, …) in order to achieve input size = output size, for a fixed input size.

I am thus wondering if it is possible (or if a particular architecture exists) to ensure input size = output size whatever the input size?

There are some of them which does some tricks, however it would be fully convolutional at all.

Some papers divides inputs in grids such that the amount of features is fixed.You can also generate latent spaces of a fixed size and then reconstruct from there. However, in general, the output size depends on the input size

1 Like