Training performance (becomes worse) inversely proportional to image resolution (larger image size for inputs)

Hello everyone,

I am doing object segmentation. I have trained an excellent model with 40x40 images as input (say it has 3 objects in it).

Now, I am considering of training a model (call it model 2) with a larger image size as input, e.g. 160x160 images (and say it has 13-15 objects in it).

Surprisingly the model performance becomes much worse, despite I have made Model 2 “wider” and “deeper” to ensure that we have a better receptive field.

Indeed, what happened with the performance of model 2 is the following:
Epoch 0: detected all 13-15 objects with some false positives around each object
Epoch 1: retained those false positives, but the detection/localisation for those 13-15 objects begin to disappear
Epoch 2: only false positives around the objects appeared, the actual objects themselves are not detected at all.

Has anyone experienced similar situation? Any thoughts are welcome! :slight_smile:

Thank you.