Neural net for coarse segmentation

Alex_I · July 9, 2020, 5:01am

I’m looking at doing something like semantic segmentation of images but where I only have pretty coarse labels - roughly, for each 32x32 patch, I know if the answer should be “yes”, “no” or “unknown”.

I’d like to start with a pretrained net that is reasonably well suited for this, and then modify it.

This is a bit like FCN-ResNet101 in the torch hub, but I don’t need any of the stuff that converts the feature maps back up to full resolution (and in my case it would probably both be slow and hurt performance since my labels just aren’t that exact).

I’m not exactly sure what the ideal size of the receptive field for each position of the feature map should be, but I suspect it should be a bit on the small side - around 128x128 to 256x256, based on what I know about the data. Receptive fields for resnet/resnext-50 and 101 before the top pooling layer are really really huge by comparison, 483 or 1027 square (most of which comes from padding with typical 224x224 size inputs). In my case I’d like to feed inputs around 1024x1024 and have receptive fields not larger than 256x256 with a 32x32 stride - ideally

Is there any example in the literature or a well-known dataset of coarse segmentation similar to this? What would be the state of the art?
What is the best network to use as a starting point? Can I use a pretrained classifier network (eg some of the nice swsl_resnext… models in timm) and just remove the top pooling layer? Is any specific classifier best for this? (resnext vs efficientnet vs…) Or, should I use a network already intended for segmentation? (in that case I’m a lot less clear on what to remove…)
If I use a network with a large receptive field, how should I shrink it? Should I just remove blocks from the top until the field shrinks enough?
What should I do about padding? I’d really like to not have padding affect the results. I can ignore the output where it’s too close to the edges; but it would be nice if I could instead somehow modify the network so it has no padding. In that case I’m really not sure what that would do to a pretrained network that expects padding - any tips?