Correct way to upload saved model weights faster r-cnn

I use a pretrained model to train a faster r-cnn, where I set pretrained to true including the backbone:

# set up model 
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True, pretrained_backbone=True)
num_classes = 2  # 1 class (object) + background

# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features

# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

At inference, however, I’m unsure of the correct way of uploading my saved model weights.

There is a slight difference when I initialise my model like this where I set pretrained to false:

# set up model 
model_test = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=False)
num_classes = num_classes  # 1 class (object) + background 

# get number of input features for the classifier
in_features = model_test.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model_test.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
model_test.to(device)
cpu_device = torch.device("cpu")

bestmodel = torch.load(bestmodel)
model_test.load_state_dict(bestmodel['state_dict'])

And like this when I set pretrained to true:

# set up model 
model_test = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True, pretrained_backbone= True)

I’m talking a few more object proposals but a difference nonetheless. Any reason why this might be happening?

That’s an interesting observation as I would assume your load_state_dict operation would replace all parameters and buffers, which apparently doesn’t seem to be the case.
Could you write a quick check in comparing all parameters and buffers using both models and check where a mismatch is happening?

Thanks for the reply. Yes, I assumed that loading the state_dict would override everything.

Would this be sufficient to check the parameters?

pytorch_total_parameters = sum(p.numel() for p in model.parameters())

No, you would have to use something like:

modelA = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True, pretrained_backbone=True)
num_classes = 2  # 1 class (object) + background

# get number of input features for the classifier
in_features = modelA.roi_heads.box_predictor.cls_score.in_features

# replace the pre-trained head with a new one
modelA.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes)


model_test = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=False)
num_classes = num_classes  # 1 class (object) + background 

# get number of input features for the classifier
in_features = model_test.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model_test.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes)

model_test.load_state_dict(modelA.state_dict())
# <All keys matched successfully>


for name, param in modelA.named_parameters():
    param_test = model_test.state_dict()[name]
    if not (param==param_test).all():
        print("Mismatch in {}".format(name))

for name, param in modelA.named_buffers():
    param_test = model_test.state_dict()[name]
    if not (param==param_test).all():
        print("Mismatch in {}".format(name))
1 Like

So I’ve just got back round to this and tested the code you provided. There are no mismatches apparently, as nothing is caught.

So I’m not really sure what is going on here. The only thing that changes with this code is the loading of the model as you have seen.

I have also done inference with simply a pretrained model (no weights uploaded) and a model that is not trained on anything whatsoever. Naturally the results are poor, but nonetheless shows that uploading a saved model produces desirable results.

Could it be that uploading the state dicts does some kind of operation on the previously loaded parameters?

Just to make sure: you did test your actual models using my code snippet and you weren’t able to find any mismatches?

I’m not sure what “uploading” the state_dict means in this context, but accessing it via model.state_dict() and storing it will not change anything in the model.

If you are still seeing different results I would guess that parts of the model are not initialized as parameters or buffers and thus not shown in the state_dict.
Could you check which layer creates the different outputs in both approaches?