How to load MoCo model weights that are stored as an state_dict?


I am new to PyTorch and I am trying to load the MoCo model in order to use it.

In the following site: I have found the code and also, I downloaded the pre-trained model (‘moco_v2_800ep_pretrain.pth.tar’) which is a state_dict with the model’s wheights.

I know that in order to use a model I need to create a model instance first and then load the state_dict.

My problem is that I can not create an instance model of MoCo. I saw the code on GiHub and I end up with the following code:

I do not know what to put in the base_encoder parameter or tell me if there is another way to load the model.

Can anyone help me please?

The weights are not for the entire MoCo model (i.e. with encoder, momentum encoder, and queue) but just for ResNet50. Have a look at how they load the weights for ImageNet classification: moco/ at 78b69cafae80bc74cd1a89ac3fb365dc20d157d3 · facebookresearch/moco · GitHub

In this case model = torchvision.models.resnet50()

Thanks for your reply Conrad.

If I understand it right, first I need to load the Resnet50 model and then load the pre-trained weights to this model.

So, according to your answer and with the code that you mentioned, with the following code I have the Resnet50 model with the MoCo pre-trained weights:

Yep, that looks right.

Hello, I have one more question.

Based on the above and after I check again on the paper, I realised that MoCo model has an output vector of 128D, so I am trying to load this specific model right now, because the model on the previous posts has an output vector of 1000D.

According to the folloeing lines of code from the github page: my code is the following:

It seems that I have a false checkpoint, I am not sure, I can not understand what to load in my MoCo model.

Does anyone have an idea?

Thanks a lot.

They only released the weights for the ResNet50 layers not the entire MoCo model. You’d have to contact the authors if you want those parameters. What’s your goal? Are you trying to fine-tune the pre-trained weights?

Yes exactly. I want to pass my own images to MoCo model, take the output for every image, which is a vector 128D and then, train my own MLP.

Create the MoCo model and load the weights using:

for k in list(state_dict.keys()):
    # retain only encoder_q up to before the embedding layer
    if k.startswith('module.encoder_q') and not k.startswith('module.encoder_q.fc'):
        # remove prefix
        k_no_prefix = k[len("module."):]
        state_dict[k_no_prefix] = state_dict[k]   # leave encoder_q in the param name
        # copy state from the query encoder into a new parameter for the key encoder
        state_dict[k_no_prefix.replace('encoder_q', 'encoder_k')] = state_dict[k]
    del state_dict[k]

msg = moco_model.load_state_dict(state_dict, strict=False)

This should give you the MoCo model with initialized query and key encoders (their parameters are copies of each other to begin with). The MLP and queue will still be randomly initialized; you’ll have to train them on your dataset.