Fine-tuning RetinaNet for user-defined num of classes

Hi!

I’m not quite experienced on modifying pretrained models’ architecture, but this time for my application I need it. I want to change the classification head of the retinanet_resnet50 model in order to adapt for a dataset with 6 classes. Any idea of how can i code that?

Hi Adrian!

I’ve never used pytorch’s RetinaNet, but it appears that you can instantiate one
with a pre-trained ResNet50 backbone with a user-specified number of classes.
Doing this won’t load pre-trained weights for the classification head (which makes
sense), so you’ll have to train those weights from scratch.

(It would probably make sense to train the classification-head weights some while
holding the backbone weights fixed so that you don’t adapt the pre-trained backbone
weights to the untrained head weights, after which you could fine tune the whole
model for a while.)

Consider:

>>> import torch
>>> torch.__version__
'2.3.0'
>>> import torchvision
>>> torchvision.__version__
'0.18.0'
>>> resnet6 = torchvision.models.detection.retinanet_resnet50_fpn (weights_backbone = 'DEFAULT', num_classes = 6)
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to <path_to_pytorch_cache>\torch\hub\checkpoints\resnet50-11ad3fa6.pth
100.0%
>>> resnet6.head.classification_head.num_classes
6

Best.

K. Frank

Thanks for the response!

I tried this but when I execute the training loop, loss doesn’t decrease, which wasn’t happening when I used the pretrained weights. Is there any chance to keep these pretrained weights while adapting the model for the desired number of classes?

Hi Adrian!

What specifically did you try?

Are you trying to train the whole backbone or just the classification-head parameters?
Does the loss change at all? Do the parameters you think you are training change at
all? How long did you train for? What optimizer and parameters are you using?

Can you set things up to freeze all but the classification-head parameters and then
try to vigorously overfit the classification head to a smallish subset of your training
data? If you do that – training a lot – does your loss start to come down?

What loss function are you using? Since you’re trying to change the number of
predicted classes, does the loss have a specific class-prediction term that you can
piece out and look at individually?

How, specifically, did you use the pretrained weights? In this case, are you talking
about the model that predicts 91 different classes?

The two models for two different numbers of classes are, of course, slightly different.
As a consistency test, can you randomly reinitialize the classification-head weights
of the 91-class model and see whether the “loss decreases” when you train with
whatever scheme you were using when you “used the pretrained weights?”

Possibly. But note that the shape of the weight tensor in the final layer of the
classification head differs between the pretrained and adapted model. So you
can’t just blindly reuse those particular weights.

Best.

K. Frank