Weird Linear Layer Behaviour when using ModuleList

Kale-ab_Tessera · October 13, 2019, 10:40am

So I have a list of layers and I apply a mask to some of the weights in the layers, by doing the following:

 def forward(self, input, mask=None):
        if mask is not None:
            self.weight.data.t().mul_(mask)
        f = F.linear(input, self.weight, self.bias)
        return f

This works when I store the list of layers like self.hidden_layers = [], but when I change this to self.hidden_layers = nn.ModuleList() the mask code above doesn’t work.

This is how I append to the list of layers:

        for i in np.arange(self.num_hidden_layers - 1):
            self.hidden_layers.append(LinearAutoMl(hidden_size, hidden_size).to(device))

Any reason why using ModuleList would do this?

Kale-ab_Tessera · October 13, 2019, 11:00am

Fixed it by changing line of code that applies the mask to
self.weight = nn.Parameter(self.weight * mask.t()) .

albanD · October 13, 2019, 11:30pm

Does the following works?

with torch.no_grad():
    self.weight.mul_(mask.t())

Kale-ab_Tessera · October 24, 2019, 7:03am

So that worked. Any reason why the above 2 methods don’t work?

albanD · October 24, 2019, 2:27pm

.data should not be used anymore as it has weird semantic.
And rewrapping into an nn.Parameter has the side effect of calling .detach(), meaning that you would get a brand new set of parameters that are not necessarily the same as the ones you passed the optimizer.

SimonW · October 24, 2019, 2:32pm

also you are changing weight in place at every forward, which is likely not what you want to do.