How to deal with mismatch of output image size and target image size

anargu · May 29, 2021, 6:51pm

Hi, I’m building a CNN model with mobilenet’s layers as the encoder. The decoder part is intended to output depth data from image but at a smaller size, let’s say w, h = 30, 30. However I have target image data of a much bigger size (640 x 480), so at the moment of comparing both of them in the loss function I’m not quite sure how to approach this mismatch or difference in the size.

One approach could be resizing the output data (from 30 x 30 to 640 x 480) and comparing both… but doesn’t it going to impact on the error/accuracy?

Another approach that I find could be to resize the target to the smaller output size (640 x 480 to 30 x 30) but I’m not quite sure on that.

What should be the approach to perform a correct computing of loss and/or which one (target our output) should be resized in this case?

ptrblck · May 30, 2021, 6:30am

I think it depends on your final use case besides the training impact.
I.e. given that your targets are much larger than the model output: how would you want to deal with this shape mismatch when this model is deployed?
Would you need to increase the spatial size of the model output before further utilizing these predictions or would the model output be fine and it could be used with its small size? Depending on this, I would try to adapt this into the training.

anargu · May 31, 2021, 5:26am

Thanks for the answer @ptrblck. I opted for upsampling the output image as I looked in other examples they are using this approach. Actually i’m working on replicating a model from a paper for depth estimation.