Weird Linear Layer Behaviour when using ModuleList

So I have a list of layers and I apply a mask to some of the weights in the layers, by doing the following:

 def forward(self, input, mask=None):
        if mask is not None:
            self.weight.data.t().mul_(mask)
        f = F.linear(input, self.weight, self.bias)
        return f

This works when I store the list of layers like self.hidden_layers = [], but when I change this to self.hidden_layers = nn.ModuleList() the mask code above doesn’t work.

This is how I append to the list of layers:

        for i in np.arange(self.num_hidden_layers - 1):
            self.hidden_layers.append(LinearAutoMl(hidden_size, hidden_size).to(device))

Any reason why using ModuleList would do this?

Fixed it by changing line of code that applies the mask to
self.weight = nn.Parameter(self.weight * mask.t()) .

Does the following works?

with torch.no_grad():
    self.weight.mul_(mask.t())

So that worked. Any reason why the above 2 methods don’t work?

.data should not be used anymore as it has weird semantic.
And rewrapping into an nn.Parameter has the side effect of calling .detach(), meaning that you would get a brand new set of parameters that are not necessarily the same as the ones you passed the optimizer.

1 Like

also you are changing weight in place at every forward, which is likely not what you want to do.

1 Like