The best method for module cloning with parameter sharing?

It seems that copy.deepcopy(module) is recommended for cloning a module without parameter sharing.

Then, what would be the best approach for cloning with parameter sharing?

I mean, weights and grads will be automatically shared if we just forward the module with different inputs and dealing with the outputs. However, if there are other non-module variables that I want to share, I am not sure what would be the most elegant way.



I’d say it really depends on the use case. I can’t think of any general recipe right now.

Any ideas on this please?
Cloning a module is typically done using copy.deepcopy. For example, the official PyTorch implementation of the Transformer uses the following:

def _get_clones(module, N):
    return ModuleList([copy.deepcopy(module) for i in range(N)])

However, when module contains shared submodules or shared weights then it is expected to keep them shared instead of creating new objects. Unfortunately the above solution doesn’t work for this case.

As an example of particular use cases, consider torchvision’s official implementation of ResNet. In this implementation, all the convolution layers share the same bias variable. Thus if we clone some convolution layers using deepcopy then new bias terms are created but they are no longer shared, which doesn’t make sense.

I hope somebody could some up with a solution. Thank you very much in advance for your help!

Could you point me to the code where the bias is shared, please?
I might have missed it but it’s the first time I hear that resnets share parameters.

1 Like

Sorry my statement was confusing (but still technically correct I think). Actually the biases are not used in the convolution layers and thus they all point to the same object.

To detect shared parameters, I use the following function from fairseq:

def _catalog_shared_params(module, memo=None, prefix=""):
    """Taken from
    if memo is None:
        first_call = True
        memo = {}
        first_call = False
    for name, param in module._parameters.items():
        param_prefix = prefix + ("." if prefix else "") + name
        if param not in memo:
            memo[param] = []
    for name, m in module._modules.items():
        if m is None:
        submodule_prefix = prefix + ("." if prefix else "") + name
        _catalog_shared_params(m, memo, submodule_prefix)
    if first_call:
        return [x for x in memo.values() if len(x) > 1]

We can obtain the list of shared parameters of, e.g., resnet50, as follows:

model = torchvision.models.__dict__['resnet50']()
shared_params = _catalog_shared_params(model)


[['conv1.bias', 'layer1.0.conv1.bias', 'layer1.0.conv2.bias', 'layer1.0.conv3.bias', 'layer1.0.downsample.0.bias', 'layer1.1.conv1.bias', 'layer1.1.conv2.bias', 'layer1.1.conv3.bias', 'layer1.2.conv1.bias', 'layer1.2.conv2.bias', 'layer1.2.conv3.bias', 'layer2.0.conv1.bias', 'layer2.0.conv2.bias', 'layer2.0.conv3.bias', 'layer2.0.downsample.0.bias', 'layer2.1.conv1.bias', 'layer2.1.conv2.bias', 'layer2.1.conv3.bias', 'layer2.2.conv1.bias', 'layer2.2.conv2.bias', 'layer2.2.conv3.bias', 'layer2.3.conv1.bias', 'layer2.3.conv2.bias', 'layer2.3.conv3.bias', 'layer3.0.conv1.bias', 'layer3.0.conv2.bias', 'layer3.0.conv3.bias', 'layer3.0.downsample.0.bias', 'layer3.1.conv1.bias', 'layer3.1.conv2.bias', 'layer3.1.conv3.bias', 'layer3.2.conv1.bias', 'layer3.2.conv2.bias', 'layer3.2.conv3.bias', 'layer3.3.conv1.bias', 'layer3.3.conv2.bias', 'layer3.3.conv3.bias', 'layer3.4.conv1.bias', 'layer3.4.conv2.bias', 'layer3.4.conv3.bias', 'layer3.5.conv1.bias', 'layer3.5.conv2.bias', 'layer3.5.conv3.bias', 'layer4.0.conv1.bias', 'layer4.0.conv2.bias', 'layer4.0.conv3.bias', 'layer4.0.downsample.0.bias', 'layer4.1.conv1.bias', 'layer4.1.conv2.bias', 'layer4.1.conv3.bias', 'layer4.2.conv1.bias', 'layer4.2.conv2.bias', 'layer4.2.conv3.bias']]

I’ve just checked again and it appears that after cloning the new biases are also shared, which is expected and thus the above is actually not a good example of use cases. Let me come back to this question in a future when I have another example. Thanks!