Convert pretrained resnet for segmentation

Hi all,

I have a pretrained resnet model that I have used for classification. Is it possible to convert the output of the model for a semantic segmentation task? If so, how? I would just like to evaluate the model for semantic segmentation, not entirely retrain the model. The output from the model is just the batch size x feature number. I’m wondering just how to convert this to segmentation. Many thanks!

One difference between your classification and a segmentation model would be the output shape and thus the last used layer. You would need to make sure the logits for each pixel are given and not only for the current sample. Replacing the “classifier” (usually linear layers with relus etc.) with conv layers could work, but you would still need to fine-tune these newly initialized layers.
Since your current output has less values than the segmentation use case would need, I don’t think there exist a proper way to convert it to a segmentation use case without changing the model architecture.

Thanks for the response! I think I would probably implement fine-tuning. Do you have any recommendation to go about retrieving the pixel-level logits?

You could start by replacing the classifier with its linear layers with a new one using conv layers.
This would make sure the output of 4-dimensional as [batch_size, nb_classes, height, width]. Depending on the used architecture, the spatial sizes (height and width) would often be smaller in the later activation maps, so your output would either also be smaller in its spatial size (compared to the input), you could try to upscale the activation in the new classifier, or change the entire model architecture to avoid aggressive pooling.