Sharing partial weights for different Conv2d layers

mengyue · March 21, 2020, 3:41am

I wonder is it possible to let two different Conv2d layers (in different number of channels) share (partial of) the weights.

I know it possible to implement via torch.nn.functional.conv2d like the following:

import torch

class DNet(torch.nn.Module):
    def __init__(self):
        super(DNet, self).__init__()
        self.W0 = torch.nn.Parameter(torch.randn(64, 3, 7, 7), requires_grad=True)
        self.avgpool = torch.nn.AdaptiveAvgPool2d((1, 1))
        self.fc0 = torch.nn.Linear(64, 1000)
        self.fc1 = torch.nn.Linear(16, 1000)

    def forward(self, input, **kwargs):
        signal = kwargs["signal"]
        if signal == 0:
            x = torch.nn.functional.conv2d(input, self.W0, None, 1)
            fc=self.fc0
        else:
            x = torch.nn.functional.conv2d(input, self.W0[:16], None, 1)
            fc = self.fc1
        return fc(torch.flatten(self.avgpool(x), 1))

model = DNet()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
model.train()
for i in range(42):
    pred = model(torch.randn(1, 3, 224, 224), signal=i % 2)
    loss = torch.nn.CrossEntropyLoss()(pred, torch.tensor([666]))
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    print("iter:%04d pred:%d loss:%.4f" % (i, torch.argmax(pred.detach()), loss.item()))

but it is not elegant when using ResNet, because we need to change the entire __init__ function. It is much easier to just share the filter weights, but I don’t know why the following can’t work:

conv1.weight=conv0.weight[:16]
# or `conv1.weight=torch.nn.Parameter(conv0.weight[:16])`. In this way "conv1" just creates another set of parameters which won't affect the one in "conv0"

I know it will work if we simply do conv1.weight=conv0.weight, but I don’t know why we cannot do slicing while still keeping them point to the same data source. Any suggestion/explanation? Thanks.