The best method for module cloning with parameter sharing?

supakjk · March 18, 2017, 9:06pm

It seems that copy.deepcopy(module) is recommended for cloning a module without parameter sharing.

Then, what would be the best approach for cloning with parameter sharing?

I mean, weights and grads will be automatically shared if we just forward the module with different inputs and dealing with the outputs. However, if there are other non-module variables that I want to share, I am not sure what would be the most elegant way.

Thanks!

apaszke · March 18, 2017, 10:06pm

I’d say it really depends on the use case. I can’t think of any general recipe right now.

f10w · April 14, 2022, 2:55pm

Any ideas on this please?
Cloning a module is typically done using copy.deepcopy. For example, the official PyTorch implementation of the Transformer uses the following:

def _get_clones(module, N):
    return ModuleList([copy.deepcopy(module) for i in range(N)])

However, when module contains shared submodules or shared weights then it is expected to keep them shared instead of creating new objects. Unfortunately the above solution doesn’t work for this case.

As an example of particular use cases, consider torchvision’s official implementation of ResNet. In this implementation, all the convolution layers share the same bias variable. Thus if we clone some convolution layers using deepcopy then new bias terms are created but they are no longer shared, which doesn’t make sense.

I hope somebody could some up with a solution. Thank you very much in advance for your help!

ptrblck · April 15, 2022, 6:31am

Could you point me to the code where the bias is shared, please?
I might have missed it but it’s the first time I hear that resnets share parameters.

f10w · April 15, 2022, 6:57am

Sorry my statement was confusing (but still technically correct I think). Actually the biases are not used in the convolution layers and thus they all point to the same object.

To detect shared parameters, I use the following function from fairseq:

def _catalog_shared_params(module, memo=None, prefix=""):
    """Taken from https://github.com/pytorch/fairseq/blob/main/fairseq/trainer.py
    """
    if memo is None:
        first_call = True
        memo = {}
    else:
        first_call = False
    for name, param in module._parameters.items():
        param_prefix = prefix + ("." if prefix else "") + name
        if param not in memo:
            memo[param] = []
        memo[param].append(param_prefix)
    for name, m in module._modules.items():
        if m is None:
            continue
        submodule_prefix = prefix + ("." if prefix else "") + name
        _catalog_shared_params(m, memo, submodule_prefix)
    if first_call:
        return [x for x in memo.values() if len(x) > 1]

We can obtain the list of shared parameters of, e.g., resnet50, as follows:

model = torchvision.models.__dict__['resnet50']()
shared_params = _catalog_shared_params(model)
print(f'shared_params:\n{shared_params}')

Output:

shared_params:
[['conv1.bias', 'layer1.0.conv1.bias', 'layer1.0.conv2.bias', 'layer1.0.conv3.bias', 'layer1.0.downsample.0.bias', 'layer1.1.conv1.bias', 'layer1.1.conv2.bias', 'layer1.1.conv3.bias', 'layer1.2.conv1.bias', 'layer1.2.conv2.bias', 'layer1.2.conv3.bias', 'layer2.0.conv1.bias', 'layer2.0.conv2.bias', 'layer2.0.conv3.bias', 'layer2.0.downsample.0.bias', 'layer2.1.conv1.bias', 'layer2.1.conv2.bias', 'layer2.1.conv3.bias', 'layer2.2.conv1.bias', 'layer2.2.conv2.bias', 'layer2.2.conv3.bias', 'layer2.3.conv1.bias', 'layer2.3.conv2.bias', 'layer2.3.conv3.bias', 'layer3.0.conv1.bias', 'layer3.0.conv2.bias', 'layer3.0.conv3.bias', 'layer3.0.downsample.0.bias', 'layer3.1.conv1.bias', 'layer3.1.conv2.bias', 'layer3.1.conv3.bias', 'layer3.2.conv1.bias', 'layer3.2.conv2.bias', 'layer3.2.conv3.bias', 'layer3.3.conv1.bias', 'layer3.3.conv2.bias', 'layer3.3.conv3.bias', 'layer3.4.conv1.bias', 'layer3.4.conv2.bias', 'layer3.4.conv3.bias', 'layer3.5.conv1.bias', 'layer3.5.conv2.bias', 'layer3.5.conv3.bias', 'layer4.0.conv1.bias', 'layer4.0.conv2.bias', 'layer4.0.conv3.bias', 'layer4.0.downsample.0.bias', 'layer4.1.conv1.bias', 'layer4.1.conv2.bias', 'layer4.1.conv3.bias', 'layer4.2.conv1.bias', 'layer4.2.conv2.bias', 'layer4.2.conv3.bias']]

f10w · April 15, 2022, 7:32am

I’ve just checked again and it appears that after cloning the new biases are also shared, which is expected and thus the above is actually not a good example of use cases. Let me come back to this question in a future when I have another example. Thanks!

Mostafa_Elhoushi · September 16, 2024, 11:34pm

Python’s deepcopy function has a memo argument for components that you don’t want to copy. So I used that to avoid cloning weights when cloning:

import copy

model_memo = {}
for _, W in model.named_parameters():
  model_memo[id(W)] = W

model_copy = copy.deepcopy(model, memo=model_memo)