Hi, I’m trying to train a two-step process for image instance segmentation of two classes, and this is multi-label since a pixel can belong in both classes at the same time (a part of the image can be both class A and class B, for example, class A can be a larger area and class B can exist inside it, hence some of the area being both class A and B). However, the multi-label nature isn’t the problem- the problem arises due to the training data image sizes being very different (as small as 10 pixels by 10 pixels, and as big as about 300 by 300 pixels) and also rectangular.
This is because this model is being trained to be a downstream model that takes the inference of a Yolo object detection model’s bounding boxes as an input. The resolution and scale of the images are the same, since they are just different sizes of bounding boxes, but if I were to train a vanilla UNet, I would have to resize and pad them to a uniform size to make sure aspect ratio the same. However, this would not be ideal since the background of my original images are not white, and the backgrounds of each image are always different, so padding might not be the best option.
I just wanted to ask the community about their opinions, of what the best idea would be. Pretty much every model out there requires a fixed image size (which are mostly square and product of 16’s). Maybe trying a scale-invariant model like DeepLabv3+ might be better since it uses ASPP? But DeepLab also requires a uniform input image size.
Would appreciate any inputs from people who has had similar problems, thank you!