The affects of input size, semantic segmentation

Hi, I asked a few questions before about a model I am working on. I am wondering how the input size affects the overall performance. For example, my image’s size is 1024x1024. If I create 4 512x512 tiles from these images for one training and create 1024 32x32 tiles for another training should I expect any changes in results? Does batch size also have an effect on this? My theory is that it can only affect the number of epochs before converting but I want to be sure about this. And one last question, can I use a model trained by 32x32 images for a prediction, in which the input is 256x256 for example (I can but is this affects the results ? ).

I would claim it might depend on the input resolution as e.g. the conv kernels were trained to detect features from a specific resolution while they might fail to detect the same features if you “zoom in/out”, which is how resizing could also be seen. If the input resolution is the same and only tiles were created, I would expect to see differences at their borders due to padding or a smaller output size assuming you are working on e.g a segmentation use case. For a classification use case I can imagine the model might completely fail to detect valid objects from a single 32x32 tile assuming the object spans the full input of 1024x1024.

The batch size has generally affects the overall model training and would influence a few layers, such as batchnorm layers, which update their running stats from the input activations.

Same as before: I claim it depends on the use case.

1 Like