Do unused layers that are not included in the forward pass affect training/evaluating?

Fawaz_Sammani · December 11, 2019, 6:16am

Hi, I would like to ask. If I define some layers in the init funtion of a class, but in the forward function, I do not use these layers (not call them), does that affect anything?

For example, take the code below:

class Net(nn.Module):

    def __init__(self, input_size, hidden_size, output_size):
        super(Net, self).__init__()
        self.input_layer = nn.Linear(input_size, hidden_size)-
        self.hidden_layer_1 = nn.Linear(hidden_size, hidden_size)
        self.hidden_layer_2 = nn.Linear(hidden_size, hidden_size)     # not used/not called in forward pass
        self.output_layer = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out = F.relu(self.input_layer(x))
        out = F.relu(self.hidden_layer_1(out))
        out = torch.sigmoid(self.output_layer(out))
        return out

alex.veuthey · December 11, 2019, 7:49am

The best way to evaluate that is to try it But here are my thoughts:

Since it’s been instantiated, and probably sent to GPU, it will consume some memory, as the parameters are created and must be kept somewhere. However, gradients won’t be computed, as it’s not present in the forward method. Also, if you send the whole model.parameters() to the optimizer, it might consume a bit more memory again? Not sure about this one.

In such cases, I would try to pass an additional parameter to the model, which defines if the layers have to be created or not, to save a bit of memory. The parameter won’t be kept if it’s just for initialization.

Eta_C · December 11, 2019, 8:14am

In addition, I suggest

class Net(nn.Module):
    def __init__(self, use_linear):
        super().__init__()
        self.use_linear = use_linear
        self.conv_layer = nn.Conv2d(3, 3, 1)
        if self.use_linear:
            self.linear_layer = nn.Linear(12, 8)
    def forward(self, x):
        x = self.conv_layer(x)
        return self.linear_layer if self.use_linear else x

Fawaz_Sammani · December 11, 2019, 8:26am

Hi @alex.veuthey and thanks for your answer.
I did already. Even though I had to repeat training from the beginning, but I wanted to give it a try. Comparing the results from the first epoch w and w/o these extra parameters, I find that it does not affect. However, if I save the optimizer using torch.save and then load it back again using torch.load, there is an error. Most probably because the optimizer has the parameters saved in its dictionary but isn’t updating them. Not sure about this.

mohit_kaushik · August 22, 2020, 12:28pm

Hi, I recently went through an error where I created one nn.Linear layer but didn’t use it in forward.
It turns out that net.parameters() generates parameters for this layer also. So, when I tried to update using for p in model.parameters(): p -= p.grad * 0.001 this gave me error TypeError: unsupported operand type(s) for *: 'NoneType' and 'float'. I resolved this by commenting the unused layer in init.

Since my loss was not depending upon unused layer gradients were none.

This is for pytorch_1.16_cpu