I am currently implementing the paper: Multi-Scale Context Aggregation by Dilated Convolutions.
When looking to the PascalVOC 2012 semantic segmentation dataset, the input images are not of the sizes. While, in the paper, it is suggested to use reflection padding in order to have a unique input size, I am wondering this:
I know that, in case of a fully convolutional architecture, the output size is not fixed but depends on the input one. These sizes are not necessarily the same but we can work on the architecture parameters (filters dimensions, padding, …) in order to achieve input size = output size, for a fixed input size.
I am thus wondering if it is possible (or if a particular architecture exists) to ensure input size = output size whatever the input size?