How to use the helper function: consume prefix in state dict if_present

he docs suggest to use:

To let a non-DDP model load a state dict from a DDP model, consume_prefix_in_state_dict_if_present() needs to be applied to strip the prefix “module.” in the DDP state dict before loading.

Source: DistributedDataParallel — PyTorch 1.9.0 documentation

however., this line:

model_state_dict=torch.nn.modules.utils.consume_prefix_in_state_dict_if_present(checkpoint['model_dict'],prefix='module.')

gives back None :frowning:

I do something ugly like this at the moment:

        new_state_dict = collections.OrderedDict()
        for k, v in checkpoint['model_dict'].items():
            name = k.replace("module.", '') # remove `module.`
            new_state_dict[name] = v

but I would prefer to use the consume_prefix_in_state_dict_if_present.

Can someone elucidate the correct usage of this please? Obviously, I am not getting it!

calling for expert :slight_smile: cc @wayi

Hi @John_J_Watson, sorry for the confusion. consume_prefix_in_state_dict_if_present removes the prefix in place rather than returns any value. You just use checkpoint['model_dict'] instead of creating a temporary variable here.

Example: pytorch/test_c10d_gloo.py at 34c9f5a8dad74ba23de5c2fba9d071a6c2dd1fa4 · pytorch/pytorch · GitHub

I created a PR to improve the documentation: Add return type hint and improve the docstring of consume_prefix_in_state_dict_if_present method by SciPioneer · Pull Req

1 Like

thank you @wayi for this. I dont know why I didnt think to try thinking it could be inplace!
Just as a follow up quetsion, this wrapper basically does the same as follows?

        new_state_dict = collections.OrderedDict()
        for k, v in checkpoint['model_dict'].items():
            name = k.replace("module.", '') # remove `module.`
            new_state_dict[name] = v

Is that right? Are there any advantages/disadvantages of using either approaches?

Thank you again!

One subtle difference is that “_metadata” field (if any) is handled separately. See:

Other than that, I don’t think there is a big difference. Your own implementation is a little less memory efficient I will say, as you don’t do it in place.