Possible to extend a pre-trained model?

I’d like to add some features to fasterrcnn_resnet50_fpn, for example feature[91]=‘chicken’ and feature[92]=‘frog’, but keep the existing 91 features. Is this possible via transfer learning? All the examples I’ve seen use the pre-trained fasterrcnn_resnet50_fpn, but then train it to only detect a handful of new features, ignoring the existing 91.

If this is possible, could I please get the broad strokes, or some links of how to go about it?

I’ve been looking into it and it looks like the pre-trained model really just gets me a jump on the needed convolutions for detecting features, and that if I want, for example “person”, “chicken”, and “frog” classes, I’ll need to include all of them in my training data set and not just “chicken” and “frog”.

Does that sound about right?

Usually this kind of finetuning is done by replacing the last or last few layers of a model and then training with additional data. However, it might be possible (if you want to keep the existing 91 classes) to just manually expand the net. If we switch the task to classification for a moment for an illustrative purposes, it might look something like:
(starting with 1024 input features, 1000 classes)

  1. Allocating a new weight matrix with random initialization and shape 1024x1002 (+2 for “chicken” and “frog”)
  2. Copying 1024x1000 weights from the pretrained model
  3. Replace the linear layer of the original model, and set the weights to the new weight matrix.

I think this approach (for transfer learning) has been explored before (e.g., https://arxiv.org/pdf/1511.05641.pdf). It seems it might be important to initialize the weights for the additional classes properly, so you might play around with things like setting the mean of the weights for the additional classes to match the mean of the others, etc…

I’d be interested to see if this works, so it would be great if you could update this thread with experimental results.

I’m not sure how I could apply your idea to the Faster R-CNN model.
I tried simply increasing the out_features of the Fast R-CNN layer, and training in some ‘frog’ and ‘chicken’ but that did not preserve detection of the existing 91 classes. The final layers of the model look like this:

  (roi_heads): RoIHeads(
    (box_roi_pool): MultiScaleRoIAlign()
    (box_head): TwoMLPHead(
      (fc6): Linear(in_features=12544, out_features=1024, bias=True)
      (fc7): Linear(in_features=1024, out_features=1024, bias=True)
    (box_predictor): FastRCNNPredictor(
      (cls_score): Linear(in_features=1024, out_features=91, bias=True)
      (bbox_pred): Linear(in_features=1024, out_features=364, bias=True)

If I figure out how to manipulate the final layers to add features, or apply the Net2Net idea from the paper you linked, I’ll be sure to post it.

It seems that you wouldn’t want to change bbox_pred as this is independent of the classes and you might want to see if you can simply finetune this without degrading detection performance significantly on the existing pretrained classes.

Then, you might see if you can expand out_features of cls_score to be 93 and copy the 1024x91 existing weights on top of the first 91 columns of the weight matrix.

Since I need to train up “frog” and “chicken” anyways, I’ll just throw in training for the other features I need at the same time. The data is readily available and it’s a well-understood process. Thanks for your suggestions and thinking about it.