Correct way to upload saved model weights faster r-cnn

SU801T · July 21, 2022, 12:09pm

I use a pretrained model to train a faster r-cnn, where I set pretrained to true including the backbone:

# set up model 
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True, pretrained_backbone=True)
num_classes = 2  # 1 class (object) + background

# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features

# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

At inference, however, I’m unsure of the correct way of uploading my saved model weights.

There is a slight difference when I initialise my model like this where I set pretrained to false:

# set up model 
model_test = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=False)
num_classes = num_classes  # 1 class (object) + background 

# get number of input features for the classifier
in_features = model_test.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model_test.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
model_test.to(device)
cpu_device = torch.device("cpu")

bestmodel = torch.load(bestmodel)
model_test.load_state_dict(bestmodel['state_dict'])

And like this when I set pretrained to true:

# set up model 
model_test = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True, pretrained_backbone= True)

I’m talking a few more object proposals but a difference nonetheless. Any reason why this might be happening?

ptrblck · July 21, 2022, 11:09pm

That’s an interesting observation as I would assume your load_state_dict operation would replace all parameters and buffers, which apparently doesn’t seem to be the case.
Could you write a quick check in comparing all parameters and buffers using both models and check where a mismatch is happening?

SU801T · July 21, 2022, 11:52pm

Thanks for the reply. Yes, I assumed that loading the state_dict would override everything.

Would this be sufficient to check the parameters?

pytorch_total_parameters = sum(p.numel() for p in model.parameters())

ptrblck · July 22, 2022, 12:01am

No, you would have to use something like:

modelA = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True, pretrained_backbone=True)
num_classes = 2  # 1 class (object) + background

# get number of input features for the classifier
in_features = modelA.roi_heads.box_predictor.cls_score.in_features

# replace the pre-trained head with a new one
modelA.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes)


model_test = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=False)
num_classes = num_classes  # 1 class (object) + background 

# get number of input features for the classifier
in_features = model_test.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model_test.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes)

model_test.load_state_dict(modelA.state_dict())
# <All keys matched successfully>


for name, param in modelA.named_parameters():
    param_test = model_test.state_dict()[name]
    if not (param==param_test).all():
        print("Mismatch in {}".format(name))

for name, param in modelA.named_buffers():
    param_test = model_test.state_dict()[name]
    if not (param==param_test).all():
        print("Mismatch in {}".format(name))

SU801T · July 23, 2022, 1:41am

So I’ve just got back round to this and tested the code you provided. There are no mismatches apparently, as nothing is caught.

So I’m not really sure what is going on here. The only thing that changes with this code is the loading of the model as you have seen.

I have also done inference with simply a pretrained model (no weights uploaded) and a model that is not trained on anything whatsoever. Naturally the results are poor, but nonetheless shows that uploading a saved model produces desirable results.

Could it be that uploading the state dicts does some kind of operation on the previously loaded parameters?

ptrblck · July 23, 2022, 3:19am

Just to make sure: you did test your actual models using my code snippet and you weren’t able to find any mismatches?

I’m not sure what “uploading” the state_dict means in this context, but accessing it via model.state_dict() and storing it will not change anything in the model.

If you are still seeing different results I would guess that parts of the model are not initialized as parameters or buffers and thus not shown in the state_dict.
Could you check which layer creates the different outputs in both approaches?

SU801T · August 31, 2022, 3:34pm

So I implemented it like this:


    modelA = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True, pretrained_backbone=True)
    num_classes = 2  # 1 class (object) + background

    # get number of input features for the classifier
    in_features = modelA.roi_heads.box_predictor.cls_score.in_features

    # replace the pre-trained head with a new one
    modelA.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes)

    modelA.load_state_dict(bestmodel['state_dict'])  #LOAD STATE DICT


    model_test = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=False)
    num_classes = num_classes  # 1 class (object) + background 

    # get number of input features for the classifier
    in_features = model_test.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new one
    model_test.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes)

    # model_test.load_state_dict(bestmodel['state_dict'])   #LOAD STATE DICT and uncomment for comparison 

    model_test.load_state_dict(modelA.state_dict())
    # <All keys matched successfully>


    for name, param in modelA.named_parameters():
        param_test = model_test.state_dict()[name]
        if not (param==param_test).all():
            print("Mismatch in {}".format(name))

    for name, param in modelA.named_buffers():
        param_test = model_test.state_dict()[name]
        if not (param==param_test).all():
            print("Mismatch in {}".format(name))

I tried different variations of loading models in above. I get no differences.

I’m certain I save the models correctly and have demonstrated in the first comment show I initialise my models.

How would I check which layers create the different outputs in both approaches?

ptrblck · August 31, 2022, 6:49pm

You could use forward hooks as described e.g. here, store the intermediate results in a dict or list for both models, and compare them using the same model input (and calling model.eval() to avoid random operations from e.g. dropout if these layers are used).