Object detection fine tuning model initialisation error

Hi All,
I am learning the pytorch API for object detection for fine tuning. My torch version is 1.12.1

from torchvision.models.detection import retinanet_resnet50_fpn_v2, RetinaNet_ResNet50_FPN_V2_Weights
from torchvision.models.detection.retinanet import RetinaNetHead
weights = RetinaNet_ResNet50_FPN_V2_Weights.DEFAULT
model = retinanet_resnet50_fpn_v2(weights=weights, num_classes=3)

The above throws an error

 num_classes = _ovewrite_value_param(num_classes, len(weights.meta["categories"]))
  File "/home/anaconda3/envs/torch_latest/lib/python3.10/site-packages/torchvision/models/_utils.py", line 246, in _ovewrite_value_param
    raise ValueError(f"The parameter '{param}' expected value {new_value} but got {param} instead.")
ValueError: The parameter '3' expected value 91 but got 3 instead.

The same error happens with SSDLite and possibly for other object detectors.

from torchvision.models.detection import ssdlite320_mobilenet_v3_large, SSDLite320_MobileNet_V3_Large_Weights
weights = SSDLite320_MobileNet_V3_Large_Weights.DEFAULT
model = ssdlite320_mobilenet_v3_large(weights=weights, num_classes=3)

The example provided here uses fasterRCNN and the number of classes are initiliazed in a different way

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights="DEFAULT")
num_classes = 2  # 1 class (person) + background
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

If I follow the same thing for fasterRCNN, I am able to train my custom dataset. My aim is to compare the results (mainly inference speed and accuracy) with other object detectors. How can I get SSD and Retinanet to work?
Thanks

I think this error is expected as specifying the pretrained weights together with a different num_classes value (than the one used to pretrain the model) conflict with each other.
Remove the num_classes argument, load the pretrained model, and create a new classification layer with the desired number of classes afterwards.

Than you for the reply… I tried creating a classification layer as mentioned in the faster rcnn example but it said, models don’t have roi_heads. Could you please give an example of how to do this and any documentation on this?

That’s strange, as this code works for me properly:

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights="DEFAULT")
num_classes = 2  # 1 class (person) + background

print(model.roi_heads.box_predictor.cls_score.in_features)
# 1024

in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes)

Are you seeing an error while executing this tutorial?

Sorry for the confusion. FasterRcnn example works perfect. I meant if I do the same for SSD or retinanet to change the number of classes, it doesn’t work. I am looking for a code snippet to change the output classes for ssd and retinanet
thanks

@ptrblck : Hi I managed to modify the fcos classification head from an internet post somewhere and it’s working now.

from torchvision.models.detection import fcos_resnet50_fpn, FCOS_ResNet50_FPN_Weights
weights = FCOS_ResNet50_FPN_Weights.DEFAULT
model = fcos_resnet50_fpn(weights=weights)  # load an object detection model pre-trained on COCO
num_anchors = model.head.classification_head.num_anchors
model.head.classification_head.num_classes = num_classes
out_channels = model.head.classification_head.conv[9].out_channels
cls_logits = torch.nn.Conv2d(out_channels, num_anchors * num_classes, kernel_size=3, stride=1, padding=1)
torch.nn.init.normal_(cls_logits.weight, std=0.01)
torch.nn.init.constant_(cls_logits.bias, -math.log((1 - 0.01) / 0.01))

I am still looking for something similar for ssd and retinanet. Thanks

hey @shyamashi have you found out the way to change the output number of classes for the retinanet, I have the same issue with the retinanet implementation I have trained my retinanet on the 3 class dataset, but now I want to use the already trained model for training on 2 class dataset.
If you find any patch, or notebook able to implement the abovementioned please do share here also.
Thanks, Regards,
Harshit

I think the following would work. Pls test it and could you post your results (whether it works or not)

        from torchvision.models.detection import retinanet_resnet50_fpn_v2, RetinaNet_ResNet50_FPN_V2_Weights
        from torchvision.models.detection.retinanet import RetinaNetHead, RetinaNetClassificationHead
        weights = RetinaNet_ResNet50_FPN_V2_Weights.DEFAULT
        model = retinanet_resnet50_fpn_v2(weights=weights, box_score_thresh=0.7)

        # replace classification layer
        out_channels = model.head.classification_head.conv[0].out_channels
        num_anchors = model.head.classification_head.num_anchors
        model.head.classification_head.num_classes = num_classes

        cls_logits = torch.nn.Conv2d(out_channels, num_anchors * num_classes, kernel_size=3, stride=1, padding=1)
        torch.nn.init.normal_(cls_logits.weight, std=0.01)  # as per pytorch code
        torch.nn.init.constant_(cls_logits.bias, -math.log((1 - 0.01) / 0.01))  # as per pytorcch code
        # assign cls head to model
        model.head.classification_head.cls_logits = cls_logits

Modify num_classes as per your requirement

Actually, my retinanet is not exactly the same retinanet of PyTorch, but it is the implementation of one of the libraries called FASTAI, which uses PyTorch at the lower level.
Here is my retinanet model creation in FASTAI:

backbone = "ResNet34" #["ResNet18", "ResNet34", "ResNet50", "ResNet101", "ResNet150"]
backbone_model = models.resnet18
if backbone == "ResNet34":
    backbone_model = models.resnet34
if backbone == "ResNet50":
    backbone_model = models.resnet50
if backbone == "ResNet101":
    backbone_model = models.resnet101
if backbone == "ResNet150":
    backbone_model = models.resnet150

pre_trained_on_imagenet = True
encoder = create_body(backbone_model, pre_trained_on_imagenet, -2)


loss_function = "FocalLoss" 

if loss_function == "FocalLoss":
    crit = RetinaNetFocalLoss(anchors)


channels = 128 


final_bias = -4 


n_conv = 3 
model = RetinaNet(encoder, n_classes=data.train_ds.c, 
                  n_anchors=len(scales) * len(ratios), 
                  sizes=[size[0] for size in sizes], 
                  chs=channels, # number of hidden layers for the classification head
                  final_bias=final_bias,
                  n_conv=n_conv # Number of hidden layers
                  )

voc = PascalVOCMetric(anchors, patch_size, [str(i) for i in data.train_ds.y.classes[1:]])
voc
learn = Learner(data, model, loss_func=crit, 
                callback_fns=[BBMetrics,ShowGraph],metrics=[voc])  

Here the library versions are as follows:
bottleneck-1.3.5 fastai-1.0.61 nvidia-ml-py3-7.352.0 object-detection-fastai-0.0.10
torch-1.11.0 torchvision-0.12.0
And when I try to load the model state dict using Pytorch it throws me this error:
P.S. I used torch.save(model.state_dict(), PATH) to save the model.
and when I load the model for the new dataset(having 2 classes) using this:

learn.model.load_state_dict(torch.load(PATH,map_location=torch.device('cpu')),strict=False)

It throws me the following error:

RuntimeError Traceback (most recent call last)
in
----> 1learn.model.load_state_dict(torch.load(PATH,map_location=torch.device('cpu')),strict=False)

1 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
1496 if len(error_msgs) > 0:
1497 raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}’.format(
→ 1498 self.class.name, “\n\t”.join(error_msgs)))
1499 return _IncompatibleKeys(missing_keys, unexpected_keys)
1500

RuntimeError: Error(s) in loading state_dict for RetinaNet:
size mismatch for classifier.3.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 128, 3, 3]).
size mismatch for classifier.3.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([2]).

So, I think you faced the same issue while fine tuning, so could you tell me with reference to my code do i need to make other changes to your patch (other than changing resnet50_fpn to resnet34_fpn)?

Retinanet implementation from fastai object detection library that i used:

from fastai import *
from fastai.vision import *
from fastai.callbacks import *
from fastai.vision.models.unet import _get_sfs_idxs

# export
class LateralUpsampleMerge(nn.Module):

    def __init__(self, ch, ch_lat, hook):
        super().__init__()
        self.hook = hook
        self.conv_lat = conv2d(ch_lat, ch, ks=1, bias=True)

    def forward(self, x):
        return self.conv_lat(self.hook.stored) + F.interpolate(x, scale_factor=2)


class RetinaNet(nn.Module):
    "Implements RetinaNet from https://arxiv.org/abs/1708.02002"

    def __init__(self, encoder: nn.Module, n_classes, final_bias:float=0.,  n_conv:float=4,
                 chs=256, n_anchors=9, flatten=True, sizes=None):
        super().__init__()
        self.n_classes, self.flatten = n_classes, flatten
        imsize = (256, 256)
        self.sizes = sizes
        sfs_szs, x, hooks = self._model_sizes(encoder, size=imsize)
        sfs_idxs = _get_sfs_idxs(sfs_szs)
        self.encoder = encoder
        self.c5top5 = conv2d(sfs_szs[-1][1], chs, ks=1, bias=True)
        self.c5top6 = conv2d(sfs_szs[-1][1], chs, stride=2, bias=True)
        self.p6top7 = nn.Sequential(nn.ReLU(), conv2d(chs, chs, stride=2, bias=True))
        self.merges = nn.ModuleList([LateralUpsampleMerge(chs, szs[1], hook)
                                     for szs, hook in zip(sfs_szs[-2:-4:-1], hooks[-2:-4:-1])])
        self.smoothers = nn.ModuleList([conv2d(chs, chs, 3, bias=True) for _ in range(3)])
        self.classifier = self._head_subnet(n_classes, n_anchors, final_bias, chs=chs, n_conv=n_conv)
        self.box_regressor = self._head_subnet(4, n_anchors, 0., chs=chs, n_conv=n_conv)

    def _head_subnet(self, n_classes, n_anchors, final_bias=0., n_conv=4, chs=256):
        layers = [self._conv2d_relu(chs, chs, bias=True) for _ in range(n_conv)]
        layers += [conv2d(chs, n_classes * n_anchors, bias=True)]
        layers[-1].bias.data.zero_().add_(final_bias)
        layers[-1].weight.data.fill_(0)
        return nn.Sequential(*layers)

    def _apply_transpose(self, func, p_states, n_classes):
        if not self.flatten:
            sizes = [[p.size(0), p.size(2), p.size(3)] for p in p_states]
            return [func(p).permute(0, 2, 3, 1).view(*sz, -1, n_classes) for p, sz in zip(p_states, sizes)]
        else:
            return torch.cat(
                [func(p).permute(0, 2, 3, 1).contiguous().view(p.size(0), -1, n_classes) for p in p_states], 1)

    def _model_sizes(self, m: nn.Module, size:tuple=(256,256), full:bool=True) -> Tuple[Sizes,Tensor,Hooks]:
        "Passes a dummy input through the model to get the various sizes"
        hooks = hook_outputs(m)
        ch_in = in_channels(m)
        x = torch.zeros(1,ch_in,*size)
        x = m.eval()(x)
        res = [o.stored.shape for o in hooks]
        if not full: hooks.remove()
        return res,x,hooks if full else res

    def _conv2d_relu(self, ni:int, nf:int, ks:int=3, stride:int=1,
                    padding:int=None, bn:bool=False, bias=True) -> nn.Sequential:
        "Create a `conv2d` layer with `nn.ReLU` activation and optional(`bn`) `nn.BatchNorm2d`"
        layers = [conv2d(ni, nf, ks=ks, stride=stride, padding=padding, bias=bias), nn.ReLU()]
        if bn: layers.append(nn.BatchNorm2d(nf))
        return nn.Sequential(*layers)

    def forward(self, x):
        c5 = self.encoder(x)
        p_states = [self.c5top5(c5.clone()), self.c5top6(c5)]
        p_states.append(self.p6top7(p_states[-1]))
        for merge in self.merges:
            p_states = [merge(p_states[0])] + p_states
        for i, smooth in enumerate(self.smoothers[:3]):
            p_states[i] = smooth(p_states[i])
        if self.sizes is not None:
            p_states = [p_state for p_state in p_states if p_state.size()[-1] in self.sizes]
        return [self._apply_transpose(self.classifier, p_states, self.n_classes),
                self._apply_transpose(self.box_regressor, p_states, 4),
                [[p.size(2), p.size(3)] for p in p_states]]

The code is present here
I hope this makes it clear.

I kind of understand your problem. I think this is very similar to my first question. Your saved model has 3 classes. Then you instantiate your model for 2 classes(which is not clear to me) and try to load the state_dict to your instantiated model which is having 3 classes. This is causing the error and it says in the error msg size mismatch.
Things that probably you could try out.

  1. Instead of instantiating the model with 2 classes, try with 3 and then load the state_dict to see if the error goes (just sanity check)
  2. Did you try this one

Yeah, I have tried the 1. instantiating with 3 classes, and the loading part is done successfully(or so we think) but when I try to learn.fit(), it throws an Index error.
Code:

state_dict = torch.load('PATH', map_location=torch.device('cpu'))
model.load_state_dict(state_dict,strict=False)
model.train()
voc = PascalVOCMetric(anchors, patch_size, [str(i) for i in data.train_ds.y.classes[1:]])
# voc
learn = Learner(data, model, loss_func=crit, 
                callback_fns=[BBMetrics,ShowGraph],metrics=[voc]) 

Error:

5 learn.fit_one_cycle(cyc_len, max_learning_rate,callbacks=[SaveModelCallback(learn, monitor='train_loss',
----> 6                                                                             name='best_train_loss_bs64_GC_1500')])

7 frames
/usr/local/lib/python3.7/dist-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, final_div, wd, callbacks, tot_epochs, start_epoch)
     21     callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor, pct_start=pct_start,
     22                                        final_div=final_div, tot_epochs=tot_epochs, start_epoch=start_epoch))
---> 23     learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
     24 
     25 def fit_fc(learn:Learner, tot_epochs:int=1, lr:float=defaults.lr,  moms:Tuple[float,float]=(0.95,0.85), start_pct:float=0.72,

/usr/local/lib/python3.7/dist-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    198         else: self.opt.lr,self.opt.wd = lr,wd
    199         callbacks = [cb(self) for cb in self.callback_fns + listify(defaults.extra_callback_fns)] + listify(callbacks)
--> 200         fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
    201 
    202     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

/usr/local/lib/python3.7/dist-packages/fastai/basic_train.py in fit(epochs, learn, callbacks, metrics)
    104             if not cb_handler.skip_validate and not learn.data.empty_val:
    105                 val_loss = validate(learn.model, learn.data.valid_dl, loss_func=learn.loss_func,
--> 106                                        cb_handler=cb_handler, pbar=pbar)
    107             else: val_loss=None
    108             if cb_handler.on_epoch_end(val_loss): break

/usr/local/lib/python3.7/dist-packages/fastai/basic_train.py in validate(model, dl, loss_func, cb_handler, pbar, average, n_batch)
     61             if not is_listy(yb): yb = [yb]
     62             nums.append(first_el(yb).shape[0])
---> 63             if cb_handler and cb_handler.on_batch_end(val_losses[-1]): break
     64             if n_batch and (len(nums)>=n_batch): break
     65         nums = np.array(nums, dtype=np.float32)

/usr/local/lib/python3.7/dist-packages/fastai/callback.py in on_batch_end(self, loss)
    306         "Handle end of processing one batch with `loss`."
    307         self.state_dict['last_loss'] = loss
--> 308         self('batch_end', call_mets = not self.state_dict['train'])
    309         if self.state_dict['train']:
    310             self.state_dict['iteration'] += 1

/usr/local/lib/python3.7/dist-packages/fastai/callback.py in __call__(self, cb_name, call_mets, **kwargs)
    248         "Call through to all of the `CallbakHandler` functions."
    249         if call_mets:
--> 250             for met in self.metrics: self._call_and_update(met, cb_name, **kwargs)
    251         for cb in self.callbacks: self._call_and_update(cb, cb_name, **kwargs)
    252 

/usr/local/lib/python3.7/dist-packages/fastai/callback.py in _call_and_update(self, cb, cb_name, **kwargs)
    239     def _call_and_update(self, cb, cb_name, **kwargs)->None:
    240         "Call `cb_name` on `cb` and update the inner state."
--> 241         new = ifnone(getattr(cb, f'on_{cb_name}')(**self.state_dict, **kwargs), dict())
    242         for k,v in new.items():
    243             if k not in self.state_dict:

/usr/local/lib/python3.7/dist-packages/object_detection_fastai/callbacks/callbacks.py in on_batch_end(self, last_output, last_target, **kwargs)
    153             num_boxes = len(bbox_gt) * 3
    154             for box, cla, scor in list(zip(bbox_pred, preds, scores))[:num_boxes]:
--> 155                 temp = BoundingBox(imageName=str(self.imageCounter), classId=self.metric_names_original[cla], x=box[0], y=box[1],
    156                                    w=box[2], h=box[3], typeCoordinates=CoordinatesType.Absolute, classConfidence=scor,
    157                                    bbType=BBType.Detected, format=BBFormat.XYWH, imgSize=(self.size, self.size))

IndexError: list index out of range

About the 2. I have not tried changing the backbone but how should I change my backbone, I mean it still will be resnet 34 only, only the last classifier head is where the problem exists,idk how changing the encoder backbone will help with my issue?

Also, I just found out that making a change in the VOC metric, does cause the training to continue without any error,
I think the 1. point does work but I have to change the voc metric as follows:

voc = PascalVOCMetric(anchors, patch_size, [str(i) for i in list[1:]])
#here the list is the old list containing 3 classes from data.train_ds.y.classes

So are there any other places where I should change my code so that training is implemented properly?

List index out of range is usually caused say for example you have 100 images and you are trying to access the 101 th image throws this error. Maybe try print(len(data) ) wherever you think to check where the actual mistake is happening. I’m not sure about fastai library and some of the things that you have mentioned as I haven’t spent time on it.

NO, the error is not happening because of the image index as fastai takes care that during the train the images are not accessed outside the defined data bunch, but if you look at the error it happens at

--> 155                 temp = BoundingBox(imageName=str(self.imageCounter), classId=self.metric_names_original[cla], x=box[0], y=box[1],
    156                                    w=box[2], h=box[3], typeCoordinates=CoordinatesType.Absolute, classConfidence=scor,
    157                                    bbType=BBType.Detected, format=BBFormat.XYWH, imgSize=(self.size, self.size))

Which is the bounding box class for each image, I fixed this error by changing the value of voc metric as it was taking classes[1:] in the new dataset which has 2 classes: background and hard positive, whereas the older model has 3 classes: background and hard positive, hard negative.
so I switched the new data class list with the older list so the training happened without any error and it shows NA value at ap-hardnegative.

This is how the training is going on:


 20.00% [10/50 01:17<05:08]
epoch	train_loss	valid_loss	pascal_voc_metric	BBloss	focal_loss	AP-hard negative	AP-Hard_positive	time
0	2.512429	0.803247	nan	0.036005	0.767242	0.373483	nan	00:04
1	2.400209	0.822690	nan	0.037761	0.784929	0.181818	nan	00:04
2	2.085350	2.831629	nan	0.061307	2.770323	0.318182	nan	00:04
3	1.920278	1.328259	nan	0.056031	1.272228	0.515152	nan	00:09
4	2.108562	2.078360	nan	0.113781	1.964579	0.328063	nan	00:13
5	1.925919	0.492334	0.341667	0.037471	0.454863	0.341667	00:08
6	1.729335	0.484553	nan	0.030840	0.453713	0.274306	nan	00:04
7	1.583344	0.438464	nan	0.036564	0.401901	0.385678	nan	00:07
8	1.507572	0.662398	0.487854	0.053566	0.608832	0.487854	00:07
9	1.377387	0.389322	nan	0.052458	0.336864	0.350000	nan	00:08

/usr/local/lib/python3.7/dist-packages/object_detection_fastai/helper/Evaluator.py:171: RuntimeWarning:

invalid value encountered in true_divide

/usr/local/lib/python3.7/dist-packages/object_detection_fastai/helper/Evaluator.py:171: RuntimeWarning:

invalid value encountered in true_divide

/usr/local/lib/python3.7/dist-packages/object_detection_fastai/helper/Evaluator.py:171: RuntimeWarning:

invalid value encountered in true_divide

/usr/local/lib/python3.7/dist-packages/object_detection_fastai/helper/Evaluator.py:171: RuntimeWarning:

invalid value encountered in true_divide

Better model found at epoch 5 with valid_loss value: 0.49233415722846985.
/usr/local/lib/python3.7/dist-packages/object_detection_fastai/helper/Evaluator.py:171: RuntimeWarning:

invalid value encountered in true_divide

Better model found at epoch 6 with valid_loss value: 0.48455333709716797.
/usr/local/lib/python3.7/dist-packages/object_detection_fastai/helper/Evaluator.py:171: RuntimeWarning:

invalid value encountered in true_divide

Better model found at epoch 7 with valid_loss value: 0.43846431374549866.
/usr/local/lib/python3.7/dist-packages/object_detection_fastai/helper/Evaluator.py:171: RuntimeWarning:

invalid value encountered in true_divide

Better model found at epoch 9 with valid_loss value: 0.3893221616744995.
/usr/local/lib/python3.7/dist-packages/object_detection_fastai/helper/Evaluator.py:171: RuntimeWarning:

invalid value encountered in true_divide

/usr/local/lib/python3.7/dist-packages/object_detection_fastai/helper/Evaluator.py:171: RuntimeWarning:

invalid value encountered in true_divide

/usr/local/lib/python3.7/dist-packages/object_detection_fastai/helper/Evaluator.py:171: RuntimeWarning:

invalid value encountered in true_divide

Better model found at epoch 12 with valid_loss value: 0.3428058922290802.
/usr/local/lib/python3.7/dist-packages/object_detection_fastai/helper/Evaluator.py:171: RuntimeWarning:

invalid value encountered in true_divide

I keep on getting this runtime warning and voc metric should work for hard positive more as it is the classes present in the new dataset.
If anyone finds a mistake in the following implementation please do put up in this forum thread.