How to use the helper function: consume prefix in state dict if_present

John_J_Watson · August 16, 2021, 7:11pm

he docs suggest to use:

To let a non-DDP model load a state dict from a DDP model, consume_prefix_in_state_dict_if_present() needs to be applied to strip the prefix “module.” in the DDP state dict before loading.

Source: DistributedDataParallel — PyTorch 1.9.0 documentation

however., this line:

model_state_dict=torch.nn.modules.utils.consume_prefix_in_state_dict_if_present(checkpoint['model_dict'],prefix='module.')

gives back None

I do something ugly like this at the moment:

        new_state_dict = collections.OrderedDict()
        for k, v in checkpoint['model_dict'].items():
            name = k.replace("module.", '') # remove `module.`
            new_state_dict[name] = v

but I would prefer to use the consume_prefix_in_state_dict_if_present.

Can someone elucidate the correct usage of this please? Obviously, I am not getting it!

mrshenli · August 17, 2021, 2:43am

calling for expert cc @wayi

wayi · August 17, 2021, 4:22am

Hi @John_J_Watson, sorry for the confusion. consume_prefix_in_state_dict_if_present removes the prefix in place rather than returns any value. You just use checkpoint['model_dict'] instead of creating a temporary variable here.

Example: pytorch/test_c10d_gloo.py at 34c9f5a8dad74ba23de5c2fba9d071a6c2dd1fa4 · pytorch/pytorch · GitHub

I created a PR to improve the documentation: Add return type hint and improve the docstring of consume_prefix_in_state_dict_if_present method by SciPioneer · Pull Req

John_J_Watson · August 18, 2021, 12:55pm

thank you @wayi for this. I dont know why I didnt think to try thinking it could be inplace!
Just as a follow up quetsion, this wrapper basically does the same as follows?

        new_state_dict = collections.OrderedDict()
        for k, v in checkpoint['model_dict'].items():
            name = k.replace("module.", '') # remove `module.`
            new_state_dict[name] = v

Is that right? Are there any advantages/disadvantages of using either approaches?

Thank you again!

wayi · August 24, 2021, 7:23pm

One subtle difference is that “_metadata” field (if any) is handled separately. See:

github.com

pytorch/pytorch/blob/e000dfcf976454fdadfdc556248976e6e560d155/torch/nn/modules/utils.py#L63

    
      
              state_dict (OrderedDict): a state-dict to be loaded to the model.
              prefix (str): prefix.
          """
          keys = sorted(state_dict.keys())
          for key in keys:
              if key.startswith(prefix):
                  newkey = key[len(prefix) :]
                  state_dict[newkey] = state_dict.pop(key)
          
          
# also strip the prefix in metadata if any.
          if "_metadata" in state_dict:
              metadata = state_dict["_metadata"]
              for key in list(metadata.keys()):
                  # for the metadata dict, the key can be:
                  # '': for the DDP module, which we want to remove.
                  # 'module': for the actual model.
                  # 'module.xx.xx': for the rest.
          
          
        if len(key) == 0:
                      continue
                  newkey = key[len(prefix) :]

Other than that, I don’t think there is a big difference. Your own implementation is a little less memory efficient I will say, as you don’t do it in place.